Re: Looking for direction
Your the second to recommend this to me. I ended up picking it up last week. So I need to sit down with it. I was able to get a working project. However, I dont fully grasp the details on how. So the book will help I'm sure. Thank you. On 05/20/2015 05:50 AM, darnold via Python-list wrote: I recommend getting your hands on "Automate The Boring Stuff With Python" from no starch press: http://www.nostarch.com/automatestuff I've not read it in its entirety, but it's very beginner-friendly and is targeted at just the sort of processing you appear to be doing. HTH, Don -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
I recommend getting your hands on "Automate The Boring Stuff With Python" from no starch press: http://www.nostarch.com/automatestuff I've not read it in its entirety, but it's very beginner-friendly and is targeted at just the sort of processing you appear to be doing. HTH, Don -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
maybe we can change this list to dict, using item[0] and item[1] as keys, the whole item as value . then you can update by the same key i think Tim Chase 于2015年5月15日 周五01:17写道: > On 2015-05-14 09:57, 20/20 Lab wrote: > > On 05/13/2015 06:23 PM, Steven D'Aprano wrote: > >>> I have a LARGE csv file that I need to process. 110+ columns, > >>> 72k rows. I managed to write enough to reduce it to a few > >>> hundred rows, and the five columns I'm interested in. > > I actually stumbled across the csv module after coding enough to > > make a list of lists. So that is more the reason I approached the > > list; Nothing like spending hours (or days) coding something that > > already exists and just dont know about. > >>> Now is were I have my problem: > >>> > >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"], > >>> [72976, "YYY", "Item", "Qty", "Noise"], > >>> [123, "XXX" "ItemTypo", "Qty", "Noise"]] > >>> > >>> Basically, I need to check for rows with duplicate accounts > >>> row[0] and staff (row[1]), and if so, remove that row, and add > >>> it's Qty to the original row. I really dont have a clue how to > >>> go about this. > >> > >> processed = {} # hold the processed data in a dict > >> > >> for row in myList: > >> account, staff = row[0:2] > >> key = (account, staff) # Put them in a tuple. > >> if key in processed: > >> # We've already seen this combination. > >> processed[key][3] += row[3] # Add the quantities. > >> else: > >> # Never seen this combination before. > >> processed[key] = row > >> > >> newlist = list(processed.values()) > >> > > It does, immensely. I'll make this work. Thank you again for the > > link from yesterday and apologies for hitting the wrong reply > > button. I'll have to study more on the usage and implementations > > of dictionaries and tuples. > > In processing the initial CSV file, I suspect that using a > csv.DictReader would make the code a bit cleaner. Additionally, > as you're processing through the initial file, unless you need > the intermediate data, you should be able to do it in one pass. > Something like > > HEADER_ACCOUNT = "account" > HEADER_STAFF = "staff" > HEADER_QTY = "Qty" > > processed = {} > with open("data.csv") as f: > reader = csv.DictReader(f) > for row in reader: > if should_process_row(row): > account = row[HEADER_ACCOUNT] > staff = row[HEADER_STAFF] > qty = row[HEADER_QTY] > try: > row[HEADER_QTY] = qty = int(qty) > except Exception: > # not a numeric quantity? > continue > # from Steven's code > key = (account, staff) > if key in processed: > processed[key][HEADER_QTY] += qty > else: > processed[key][HEADER_QTY] = row > so_something_with(processed.values()) > > I find that using names is a lot clearer than using arbitrary > indexing. Barring that, using indexes-as-constants still would > add further clarity. > > -tkc > > > > > . > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 05/13/2015 06:12 PM, Dave Angel wrote: On 05/13/2015 08:45 PM, 20/20 Lab wrote:> You accidentally replied to me, rather than the mailing list. Please use reply-list, or if your mailer can't handle that, do a Reply-All, and remove the parts you don't want. > > On 05/13/2015 05:07 PM, Dave Angel wrote: >> On 05/13/2015 07:24 PM, 20/20 Lab wrote: >>> I'm a beginner to python. Reading here and there. Written a couple of >>> short and simple programs to make life easier around the office. >>> >> Welcome to Python, and to this mailing list. >> >>> That being said, I'm not even sure what I need to ask for. I've never >>> worked with external data before. >>> >>> I have a LARGE csv file that I need to process. 110+ columns, 72k >>> rows. >> >> That's not very large at all. >> > In the grand scheme, I guess not. However I'm currently doing this > whole process using office. So it can be a bit daunting. I'm not familiar with the "office" operating system. >>> I managed to write enough to reduce it to a few hundred rows, and >>> the five columns I'm interested in. >> >>> >>> Now is were I have my problem: >>> >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"], >>> [72976, "YYY", "Item", "Qty", "Noise"], >>> [123, "XXX" "ItemTypo", "Qty", "Noise"]] >>> >> >> It'd probably be useful to identify names for your columns, even if >> it's just in a comment. Guessing from the paragraph below, I figure >> the first two columns are "account" & "staff" > > The columns that I pull are Account, Staff, Item Sold, Quantity sold, > and notes about the sale (notes arent particularly needed, but the > higher ups would like them in the report) >> >>> Basically, I need to check for rows with duplicate accounts row[0] and >>> staff (row[1]), and if so, remove that row, and add it's Qty to the >>> original row. >> >> And which column is that supposed to be? Shouldn't there be a number >> there, rather than a string? >> >>> I really dont have a clue how to go about this. The >>> number of rows change based on which run it is, so I couldnt even get >>> away with using hundreds of compare loops. >>> >>> If someone could point me to some documentation on the functions I would >>> need, or a tutorial it would be a great help. >>> >> >> Is the order significant? Do you have to preserve the order that the >> accounts appear? I'll assume not. >> >> Have you studied dictionaries? Seems to me the way to handle the >> problem is to read in a row, create a dictionary with key of (account, >> staff), and data of the rest of the line. >> >> Each time you read a row, you check if the key is already in the >> dictionary. If not, add it. If it's already there, merge the data as >> you say. >> >> Then when you're done, turn the dict back into a list of lists. >> > The order is irrelevant. No, I've not really studied dictionaries, but > a few people have mentioned it. I'll have to read up on them and, more > importantly, their applications. Seems that they are more versatile > then I thought. > > Thank you. You have to realize that a tuple can be used as a key, in your case a tuple of Account and Staff. You'll have to decide how you're going to merge the ItemSold, QuantitySold, and notes. Tells you how often I actually talk in mailing lists. My apologies, and thank you again. -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 05/13/2015 06:12 PM, Dave Angel wrote: On 05/13/2015 08:45 PM, 20/20 Lab wrote:> You accidentally replied to me, rather than the mailing list. Please use reply-list, or if your mailer can't handle that, do a Reply-All, and remove the parts you don't want. ...and now that you mention it. I appear to have done that with all of my replies yesterday. My deepest apologies for that. -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 2015-05-14 09:57, 20/20 Lab wrote: > On 05/13/2015 06:23 PM, Steven D'Aprano wrote: >>> I have a LARGE csv file that I need to process. 110+ columns, >>> 72k rows. I managed to write enough to reduce it to a few >>> hundred rows, and the five columns I'm interested in. > I actually stumbled across the csv module after coding enough to > make a list of lists. So that is more the reason I approached the > list; Nothing like spending hours (or days) coding something that > already exists and just dont know about. >>> Now is were I have my problem: >>> >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"], >>> [72976, "YYY", "Item", "Qty", "Noise"], >>> [123, "XXX" "ItemTypo", "Qty", "Noise"]] >>> >>> Basically, I need to check for rows with duplicate accounts >>> row[0] and staff (row[1]), and if so, remove that row, and add >>> it's Qty to the original row. I really dont have a clue how to >>> go about this. >> >> processed = {} # hold the processed data in a dict >> >> for row in myList: >> account, staff = row[0:2] >> key = (account, staff) # Put them in a tuple. >> if key in processed: >> # We've already seen this combination. >> processed[key][3] += row[3] # Add the quantities. >> else: >> # Never seen this combination before. >> processed[key] = row >> >> newlist = list(processed.values()) >> > It does, immensely. I'll make this work. Thank you again for the > link from yesterday and apologies for hitting the wrong reply > button. I'll have to study more on the usage and implementations > of dictionaries and tuples. In processing the initial CSV file, I suspect that using a csv.DictReader would make the code a bit cleaner. Additionally, as you're processing through the initial file, unless you need the intermediate data, you should be able to do it in one pass. Something like HEADER_ACCOUNT = "account" HEADER_STAFF = "staff" HEADER_QTY = "Qty" processed = {} with open("data.csv") as f: reader = csv.DictReader(f) for row in reader: if should_process_row(row): account = row[HEADER_ACCOUNT] staff = row[HEADER_STAFF] qty = row[HEADER_QTY] try: row[HEADER_QTY] = qty = int(qty) except Exception: # not a numeric quantity? continue # from Steven's code key = (account, staff) if key in processed: processed[key][HEADER_QTY] += qty else: processed[key][HEADER_QTY] = row so_something_with(processed.values()) I find that using names is a lot clearer than using arbitrary indexing. Barring that, using indexes-as-constants still would add further clarity. -tkc . -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
- On Thu, May 14, 2015 3:35 PM CEST Dennis Lee Bieber wrote: >On Wed, 13 May 2015 16:24:30 -0700, 20/20 Lab declaimed >the following: > >>Now is were I have my problem: >> >>myList = [ [123, "XXX", "Item", "Qty", "Noise"], >>[72976, "YYY", "Item", "Qty", "Noise"], >>[123, "XXX" "ItemTypo", "Qty", "Noise"]] >> >>Basically, I need to check for rows with duplicate accounts row[0] and >>staff (row[1]), and if so, remove that row, and add it's Qty to the >>original row. I really dont have a clue how to go about this. The >>number of rows change based on which run it is, so I couldnt even get >>away with using hundreds of compare loops. >> >>If someone could point me to some documentation on the functions I would >>need, or a tutorial it would be a great help. >> > > This appears to be a matter of algorithm development -- there won't be >an pre-made "function" for it. The closest would be the summing functions >(control break http://en.wikipedia.org/wiki/Control_break ) of a report >writer application. > > The short gist would be: > > SORT the data by the account field > Initialize sum using first record > loop > read next record > if end of data > output sum record > exit > if record is same account as sum > add quantity to sum > else > output sum record > reset sum to the new record > > Granted -- loading the data into an SQL capable database would make >this simple... > > select account, sum(quantity) from table > order by account You could also use pandas. Read the data in a DataFrame, create a groupby object, use the sum() and the first() methods. http://pandas.pydata.org/pandas-docs/version/0.15.2/groupby.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 05/13/2015 06:23 PM, Steven D'Aprano wrote: On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote: I'm a beginner to python. Reading here and there. Written a couple of short and simple programs to make life easier around the office. That being said, I'm not even sure what I need to ask for. I've never worked with external data before. I have a LARGE csv file that I need to process. 110+ columns, 72k rows. I managed to write enough to reduce it to a few hundred rows, and the five columns I'm interested in. That's not large. Large is millions of rows, or tens of millions if you have enough memory. What's large to you and me is usually small to the computer. You should use the csv module for handling the CSV file, if you aren't already doing so. Do you need a url to the docs? I actually stumbled across the csv module after coding enough to make a list of lists. So that is more the reason I approached the list; Nothing like spending hours (or days) coding something that already exists and just dont know about. Now is were I have my problem: myList = [ [123, "XXX", "Item", "Qty", "Noise"], [72976, "YYY", "Item", "Qty", "Noise"], [123, "XXX" "ItemTypo", "Qty", "Noise"]] Basically, I need to check for rows with duplicate accounts row[0] and staff (row[1]), and if so, remove that row, and add it's Qty to the original row. I really dont have a clue how to go about this. Is the order of the rows important? If not, the problem is simpler. processed = {} # hold the processed data in a dict for row in myList: account, staff = row[0:2] key = (account, staff) # Put them in a tuple. if key in processed: # We've already seen this combination. processed[key][3] += row[3] # Add the quantities. else: # Never seen this combination before. processed[key] = row newlist = list(processed.values()) Does that help? It does, immensely. I'll make this work. Thank you again for the link from yesterday and apologies for hitting the wrong reply button. I'll have to study more on the usage and implementations of dictionaries and tuples. -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote: > I'm a beginner to python. Reading here and there. Written a couple of > short and simple programs to make life easier around the office. > > That being said, I'm not even sure what I need to ask for. I've never > worked with external data before. > > I have a LARGE csv file that I need to process. 110+ columns, 72k > rows. I managed to write enough to reduce it to a few hundred rows, and > the five columns I'm interested in. That's not large. Large is millions of rows, or tens of millions if you have enough memory. What's large to you and me is usually small to the computer. You should use the csv module for handling the CSV file, if you aren't already doing so. Do you need a url to the docs? > Now is were I have my problem: > > myList = [ [123, "XXX", "Item", "Qty", "Noise"], > [72976, "YYY", "Item", "Qty", "Noise"], > [123, "XXX" "ItemTypo", "Qty", "Noise"]] > > Basically, I need to check for rows with duplicate accounts row[0] and > staff (row[1]), and if so, remove that row, and add it's Qty to the > original row. I really dont have a clue how to go about this. Is the order of the rows important? If not, the problem is simpler. processed = {} # hold the processed data in a dict for row in myList: account, staff = row[0:2] key = (account, staff) # Put them in a tuple. if key in processed: # We've already seen this combination. processed[key][3] += row[3] # Add the quantities. else: # Never seen this combination before. processed[key] = row newlist = list(processed.values()) Does that help? -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 05/13/2015 08:45 PM, 20/20 Lab wrote:> You accidentally replied to me, rather than the mailing list. Please use reply-list, or if your mailer can't handle that, do a Reply-All, and remove the parts you don't want. > > On 05/13/2015 05:07 PM, Dave Angel wrote: >> On 05/13/2015 07:24 PM, 20/20 Lab wrote: >>> I'm a beginner to python. Reading here and there. Written a couple of >>> short and simple programs to make life easier around the office. >>> >> Welcome to Python, and to this mailing list. >> >>> That being said, I'm not even sure what I need to ask for. I've never >>> worked with external data before. >>> >>> I have a LARGE csv file that I need to process. 110+ columns, 72k >>> rows. >> >> That's not very large at all. >> > In the grand scheme, I guess not. However I'm currently doing this > whole process using office. So it can be a bit daunting. I'm not familiar with the "office" operating system. >>> I managed to write enough to reduce it to a few hundred rows, and >>> the five columns I'm interested in. >> >>> >>> Now is were I have my problem: >>> >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"], >>> [72976, "YYY", "Item", "Qty", "Noise"], >>> [123, "XXX" "ItemTypo", "Qty", "Noise"]] >>> >> >> It'd probably be useful to identify names for your columns, even if >> it's just in a comment. Guessing from the paragraph below, I figure >> the first two columns are "account" & "staff" > > The columns that I pull are Account, Staff, Item Sold, Quantity sold, > and notes about the sale (notes arent particularly needed, but the > higher ups would like them in the report) >> >>> Basically, I need to check for rows with duplicate accounts row[0] and >>> staff (row[1]), and if so, remove that row, and add it's Qty to the >>> original row. >> >> And which column is that supposed to be? Shouldn't there be a number >> there, rather than a string? >> >>> I really dont have a clue how to go about this. The >>> number of rows change based on which run it is, so I couldnt even get >>> away with using hundreds of compare loops. >>> >>> If someone could point me to some documentation on the functions I would >>> need, or a tutorial it would be a great help. >>> >> >> Is the order significant? Do you have to preserve the order that the >> accounts appear? I'll assume not. >> >> Have you studied dictionaries? Seems to me the way to handle the >> problem is to read in a row, create a dictionary with key of (account, >> staff), and data of the rest of the line. >> >> Each time you read a row, you check if the key is already in the >> dictionary. If not, add it. If it's already there, merge the data as >> you say. >> >> Then when you're done, turn the dict back into a list of lists. >> > The order is irrelevant. No, I've not really studied dictionaries, but > a few people have mentioned it. I'll have to read up on them and, more > importantly, their applications. Seems that they are more versatile > then I thought. > > Thank you. You have to realize that a tuple can be used as a key, in your case a tuple of Account and Staff. You'll have to decide how you're going to merge the ItemSold, QuantitySold, and notes. -- DaveA -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 2015-05-14 01:06, Ethan Furman wrote: On 05/13/2015 04:24 PM, 20/20 Lab wrote: I'm a beginner to python. Reading here and there. Written a couple of short and simple programs to make life easier around the office. That being said, I'm not even sure what I need to ask for. I've never worked with external data before. I have a LARGE csv file that I need to process. 110+ columns, 72k rows. I managed to write enough to reduce it to a few hundred rows, and the five columns I'm interested in. Now is were I have my problem: myList = [ [123, "XXX", "Item", "Qty", "Noise"], [72976, "YYY", "Item", "Qty", "Noise"], [123, "XXX" "ItemTypo", "Qty", "Noise"]] Basically, I need to check for rows with duplicate accounts row[0] and staff (row[1]), and if so, remove that row, and add it's Qty to the original row. I really dont have a clue how to go about this. The number of rows change based on which run it is, so I couldnt even get away with using hundreds of compare loops. If someone could point me to some documentation on the functions I would need, or a tutorial it would be a great help. You could try using a dictionary, combining when needed: # untested data = {} for row in all_rows: key = row[0], row[1] if key in data: item, qty, noise = data[key] qty += row[3] else: item, qty, noise = row[2:] data[key] = item, qty, noise for (account, staff), (item, qty, noise) in data.items(): do_stuff_with(account, staff, item, qty, noise) At the end, data should have what you want. It won't, however, be in the same order, so hopefully that's not an issue for you. Starting from that, if the order matters, you can do it this way: data = {} order = {} for index, row in enumerate(all_rows): key = row[0], row[1] if key in data: item, qty, noise = data[key] qty += row[3] else: item, qty, noise = row[2:] data[key] = item, qty, noise order.setdefault(key, index) merged_rows = [(account, staff, item, qty, noise) for (account, staff), (item, qty, noise) in data.items()] def original_order(row): key = row[0], row[1] return order[key] merged_rows.sort(key=original_order) -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 05/13/2015 04:24 PM, 20/20 Lab wrote: I'm a beginner to python. Reading here and there. Written a couple of short and simple programs to make life easier around the office. That being said, I'm not even sure what I need to ask for. I've never worked with external data before. I have a LARGE csv file that I need to process. 110+ columns, 72k rows. I managed to write enough to reduce it to a few hundred rows, and the five columns I'm interested in. Now is were I have my problem: myList = [ [123, "XXX", "Item", "Qty", "Noise"], [72976, "YYY", "Item", "Qty", "Noise"], [123, "XXX" "ItemTypo", "Qty", "Noise"]] Basically, I need to check for rows with duplicate accounts row[0] and staff (row[1]), and if so, remove that row, and add it's Qty to the original row. I really dont have a clue how to go about this. The number of rows change based on which run it is, so I couldnt even get away with using hundreds of compare loops. If someone could point me to some documentation on the functions I would need, or a tutorial it would be a great help. You could try using a dictionary, combining when needed: # untested data = {} for row in all_rows: key = row[0], row[1] if key in data: item, qty, noise = data[key] qty += row[3] else: item, qty, noise = row[2:] data[key] = item, qty, noise for (account, staff), (item, qty, noise) in data.items(): do_stuff_with(account, staff, item, qty, noise) At the end, data should have what you want. It won't, however, be in the same order, so hopefully that's not an issue for you. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
20/20 Lab writes: > I'm a beginner to python. Reading here and there. Written a couple of > short and simple programs to make life easier around the office. Welcome, and congratulations on self-educating to this point. > myList = [ [123, "XXX", "Item", "Qty", "Noise"], >[72976, "YYY", "Item", "Qty", "Noise"], >[123, "XXX" "ItemTypo", "Qty", "Noise"]] > > Basically, I need to check for rows with duplicate accounts row[0] and > staff (row[1]), and if so, remove that row, and add it's Qty to the > original row. I really dont have a clue how to go about this. You might benefit from doing some simple study of algorithms, with exercises so you can test your knowledge and learn new ways of thinking about basic algorithms. I say that because what will be most helpful in the situation you're facing is to *already* have learned to think about already-solved algorithms like a toolkit. And the only way to get that toolkit is to do some study, rather than solving each problem in the real world as it comes along. In this case, you are stuck IMO because the process you want to perform on the data needs to be formally expressed. Here's an attempt: For each unique pair (‘account_nr’, ‘staff_name’): Sum all the ‘qty’ values as ‘total_qty’ Emit a record (‘account_nr’, ‘staff_name’, ‘total_qty’) Once expressed that way, it becomes clear to me that the requirements as stated have a gap. What becomes of ‘item’, ‘noise’, etc. values for the same (‘account_nr’, ‘staff_name’) pair? Are they simply discarded as uninteresting? If not discarded, how are they processed to make a single record for that (‘account_nr’, ‘staff_name’) pair? You don't have to respond to me with the answers. But you will need to deal with that issue, and probably others. The advantage of formally stating the process you want is to debug it before even writing a line of code to solve it. Here is a course on problem solving with algorithms that uses Python http://interactivepython.org/runestone/static/pythonds/index.html>. Good hunting! -- \“We cannot solve our problems with the same thinking we used | `\ when we created them.” —Albert Einstein | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 05/13/2015 07:24 PM, 20/20 Lab wrote: I'm a beginner to python. Reading here and there. Written a couple of short and simple programs to make life easier around the office. Welcome to Python, and to this mailing list. That being said, I'm not even sure what I need to ask for. I've never worked with external data before. I have a LARGE csv file that I need to process. 110+ columns, 72k rows. That's not very large at all. I managed to write enough to reduce it to a few hundred rows, and the five columns I'm interested in. Now is were I have my problem: myList = [ [123, "XXX", "Item", "Qty", "Noise"], [72976, "YYY", "Item", "Qty", "Noise"], [123, "XXX" "ItemTypo", "Qty", "Noise"]] It'd probably be useful to identify names for your columns, even if it's just in a comment. Guessing from the paragraph below, I figure the first two columns are "account" & "staff" Basically, I need to check for rows with duplicate accounts row[0] and staff (row[1]), and if so, remove that row, and add it's Qty to the original row. And which column is that supposed to be? Shouldn't there be a number there, rather than a string? I really dont have a clue how to go about this. The number of rows change based on which run it is, so I couldnt even get away with using hundreds of compare loops. If someone could point me to some documentation on the functions I would need, or a tutorial it would be a great help. Is the order significant? Do you have to preserve the order that the accounts appear? I'll assume not. Have you studied dictionaries? Seems to me the way to handle the problem is to read in a row, create a dictionary with key of (account, staff), and data of the rest of the line. Each time you read a row, you check if the key is already in the dictionary. If not, add it. If it's already there, merge the data as you say. Then when you're done, turn the dict back into a list of lists. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for direction
On 14/05/2015 00:24, 20/20 Lab wrote: I'm a beginner to python. Reading here and there. Written a couple of short and simple programs to make life easier around the office. Welcome :) That being said, I'm not even sure what I need to ask for. I've never worked with external data before. I have a LARGE csv file that I need to process. 110+ columns, 72k rows. I managed to write enough to reduce it to a few hundred rows, and the five columns I'm interested in. Now is were I have my problem: myList = [ [123, "XXX", "Item", "Qty", "Noise"], [72976, "YYY", "Item", "Qty", "Noise"], [123, "XXX" "ItemTypo", "Qty", "Noise"]] Basically, I need to check for rows with duplicate accounts row[0] and staff (row[1]), and if so, remove that row, and add it's Qty to the original row. I really dont have a clue how to go about this. The number of rows change based on which run it is, so I couldnt even get away with using hundreds of compare loops. If someone could point me to some documentation on the functions I would need, or a tutorial it would be a great help. Thank you. Check this out http://pandas.pydata.org/ -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list