Re: [Tutor] List processing question - consolidating duplicate entries
Richard Querin wrote: > import itertools, operator > for k, g in itertools.groupby(sorted(data), key=operator.itemgetter(0, > 1, 2, 3)): > print k, sum(item[4] for item in g) > > > > I'm trying to understand what's going on in the for statement but I'm > having troubles. The interpreter is telling me that itemgetter expects 1 > argument and is getting 4. You must be using an older version of Python, the ability to pass multiple arguments to itemgetter was added in 2.5. Meanwhile it's easy enough to define your own: def make_key(item): return (item[:4]) and then specify key=make_key. BTW when you want help with an error, please copy and paste the entire error message and traceback into your email. > I understand that groupby takes 2 parameters the first being the sorted > list. The second is a key and this is where I'm confused. The itemgetter > function is going to return a tuple of functions (f[0],f[1],f[2],f[3]). No, it returns one function that will return a tuple of values. > Should I only be calling itemgetter with whatever element (0 to 3) that > I want to group the items by? If you do that it will only group by the single item you specify. groupby() doesn't sort so you should also sort by the same key. But I don't think that is what you want. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List processing question - consolidating duplicate entries
On Nov 27, 2007 5:40 PM, Kent Johnson <[EMAIL PROTECTED]> wrote: > > This is a two-liner using itertools.groupby() and operator.itemgetter: > > data = [['Bob', '07129', 'projectA', '4001',5], > ['Bob', '07129', 'projectA', '5001',2], > ['Bob', '07101', 'projectB', '4001',1], > ['Bob', '07140', 'projectC', '3001',3], > ['Bob', '07099', 'projectD', '3001',2], > ['Bob', '07129', 'projectA', '4001',4], > ['Bob', '07099', 'projectD', '4001',3], > ['Bob', '07129', 'projectA', '4001',2] > ] > > import itertools, operator > for k, g in itertools.groupby(sorted(data), key=operator.itemgetter(0, > 1, 2, 3)): > print k, sum(item[4] for item in g) > I'm trying to understand what's going on in the for statement but I'm having troubles. The interpreter is telling me that itemgetter expects 1 argument and is getting 4. I understand that groupby takes 2 parameters the first being the sorted list. The second is a key and this is where I'm confused. The itemgetter function is going to return a tuple of functions (f[0],f[1],f[2],f[3]). Should I only be calling itemgetter with whatever element (0 to 3) that I want to group the items by? I'm almost getting this but not quite. ;) RQ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List processing question - consolidating duplicate entries
Michael Langford wrote: > What you want is a set of entries. Not really; he wants to aggregate entries. > # remove duplicate entries > # > # myEntries is a list of lists, > #such as [[1,2,3],[1,2,"foo"],[1,2,3]] > # > s=set() > [s.add(tuple(x)) for x in myEntries] A set can be constructed directly from a sequence so this can be written as s=set(tuple(x) for x in myEntries) BTW I personally think it is bad style to use a list comprehension just for the side effect of iteration, IMO it is clearer to write out the loop when you want a loop. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List processing question - consolidating duplicate entries
What you want is a set of entries. Unfortunately, python lists are not "hashable" which means you have to convert them to something hashable before you can use the python set datatype. What you'd like to do is add each to a set while converting them to a tuple, then convert them back out of the set. In python that is: # # remove duplicate entries # # myEntries is a list of lists, #such as [[1,2,3],[1,2,"foo"],[1,2,3]] # s=set() [s.add(tuple(x)) for x in myEntries] myEntries = [list(x) for x in list(s)] List completions are useful for all sorts of list work, this included. Do not use a database, that would be very ugly and time consuming too. This is cleaner than the dict keys approach, as you'd *also* have to convert to tuples for that. If you need this in non-list completion form, I'd be happy to write one if that's clearer to you on what's happening. --Michael -- Michael Langford Phone: 404-386-0495 Consulting: http://www.RowdyLabs.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List processing question - consolidating duplicate entries
bob gailer wrote: > 2 - Sort the list. Create a new list with an entry for the first name, > project, workcode. Step thru the list. Each time the name, project, > workcode is the same, accumulate hours. When any of those change, create > a list entry for the next name, project, workcode and again start > accumulating hours. This is a two-liner using itertools.groupby() and operator.itemgetter: data = [['Bob', '07129', 'projectA', '4001',5], ['Bob', '07129', 'projectA', '5001',2], ['Bob', '07101', 'projectB', '4001',1], ['Bob', '07140', 'projectC', '3001',3], ['Bob', '07099', 'projectD', '3001',2], ['Bob', '07129', 'projectA', '4001',4], ['Bob', '07099', 'projectD', '4001',3], ['Bob', '07129', 'projectA', '4001',2] ] import itertools, operator for k, g in itertools.groupby(sorted(data), key=operator.itemgetter(0, 1, 2, 3)): print k, sum(item[4] for item in g) For some explanation see my recent post: http://mail.python.org/pipermail/tutor/2007-November/058753.html Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List processing question - consolidating duplicate entries
Richard Querin wrote: > I'm trying to process a list and I'm stuck. Hopefully someone can help > me out here: > > I've got a list that is formatted as follows: > [Name,job#,jobname,workcode,hours] > > An example might be: > > [Bob,07129,projectA,4001,5] > [Bob,07129,projectA,5001,2] > [Bob,07101,projectB,4001,1] > [Bob,07140,projectC,3001,3] > [Bob,07099,projectD,3001,2] > [Bob,07129,projectA,4001,4] > [Bob,07099,projectD,4001,3] > [Bob,07129,projectA,4001,2] > > Now I'd like to consolidate entries that are duplicates. Duplicates > meaning entries that share the same Name, job#, jobname and workcode. > So for the list above, there are 3 entries for projectA which have a > workcode of 4001. (there is a fourth entry for projectA but it's > workcode is 5001 and not 4001). > > So I'd like to end up with a list so that the three duplicate entries > are consolidated into one with their hours added up: > > [Bob,07129,projectA,4001,11] > [Bob,07129,projectA,5001,2] > [Bob,07101,projectB,4001,1] > [Bob,07140,projectC,3001,3] > [Bob,07099,projectD,3001,2] > [Bob,07099,projectD,4001,3] There are at least 2 more approaches. 1 - Use sqlite (or some other database), insert the data into the database, then run a sql statement to sum(hours) group by name, project, workcode. 2 - Sort the list. Create a new list with an entry for the first name, project, workcode. Step thru the list. Each time the name, project, workcode is the same, accumulate hours. When any of those change, create a list entry for the next name, project, workcode and again start accumulating hours. The last is IMHO the most straightforward, and easiest to code. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List processing question - consolidating duplicate entries
On 28/11/2007, Richard Querin <[EMAIL PROTECTED]> wrote: > I've got a list that is formatted as follows: > [Name,job#,jobname,workcode,hours] [...] > Now I'd like to consolidate entries that are duplicates. Duplicates > meaning entries that share the same Name, job#, jobname and workcode. > So for the list above, there are 3 entries for projectA which have a > workcode of 4001. (there is a fourth entry for projectA but it's > workcode is 5001 and not 4001). You use a dictionary: pull out the jobname and workcode as the dictionary key. import operator # if job is an element of the list, then jobKey(job) will be (jobname, workcode) jobKey = operator.itemgetter(2, 3) jobList = [...] # the list of jobs jobDict = {} for job in jobList: try: jobDict[jobKey(job)][4] += job[4] except KeyError: jobDict[jobKey(job)] = job (note that this will modify the jobs in your original list... if this is Bad, you can replace the last line with "... = job[:]") HTH! -- John. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] List processing question - consolidating duplicate entries
I'm trying to process a list and I'm stuck. Hopefully someone can help me out here: I've got a list that is formatted as follows: [Name,job#,jobname,workcode,hours] An example might be: [Bob,07129,projectA,4001,5] [Bob,07129,projectA,5001,2] [Bob,07101,projectB,4001,1] [Bob,07140,projectC,3001,3] [Bob,07099,projectD,3001,2] [Bob,07129,projectA,4001,4] [Bob,07099,projectD,4001,3] [Bob,07129,projectA,4001,2] Now I'd like to consolidate entries that are duplicates. Duplicates meaning entries that share the same Name, job#, jobname and workcode. So for the list above, there are 3 entries for projectA which have a workcode of 4001. (there is a fourth entry for projectA but it's workcode is 5001 and not 4001). So I'd like to end up with a list so that the three duplicate entries are consolidated into one with their hours added up: [Bob,07129,projectA,4001,11] [Bob,07129,projectA,5001,2] [Bob,07101,projectB,4001,1] [Bob,07140,projectC,3001,3] [Bob,07099,projectD,3001,2] [Bob,07099,projectD,4001,3] I've tried doing it with brute force by stepping through each item and checking all the other items for matches, and then trying to build a new list as I go, but that's still confusing me - for instance how can I delete the items that I've already consolidated so they don't get processed again?. I'm not a programmer by trade so I'm sorry if this is a basic computer science question. RQ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor