subject:"\[Tutor\] List processing question \- consolidating duplicate entries"

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-29 Thread Kent Johnson

Richard Querin wrote:
> import itertools, operator
> for k, g in itertools.groupby(sorted(data), key=operator.itemgetter(0,
> 1, 2, 3)):
>   print k, sum(item[4] for item in g)
> 
> 
> 
> I'm trying to understand what's going on in the for statement but I'm 
> having troubles. The interpreter is telling me that itemgetter expects 1 
> argument and is getting 4.

You must be using an older version of Python, the ability to pass 
multiple arguments to itemgetter was added in 2.5. Meanwhile it's easy 
enough to define your own:
def make_key(item):
   return (item[:4])

and then specify key=make_key.

BTW when you want help with an error, please copy and paste the entire 
error message and traceback into your email.

> I understand that groupby takes 2 parameters the first being the sorted 
> list. The second is a key and this is where I'm confused. The itemgetter 
> function is going to return a tuple of functions (f[0],f[1],f[2],f[3]).

No, it returns one function that will return a tuple of values.

> Should I only be calling itemgetter with whatever element (0 to 3) that 
> I want to group the items by?

If you do that it will only group by the single item you specify. 
groupby() doesn't sort so you should also sort by the same key. But I 
don't think that is what you want.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-29 Thread Richard Querin

On Nov 27, 2007 5:40 PM, Kent Johnson <[EMAIL PROTECTED]> wrote:

>
> This is a two-liner using itertools.groupby() and operator.itemgetter:
>
> data = [['Bob', '07129', 'projectA', '4001',5],
> ['Bob', '07129', 'projectA', '5001',2],
> ['Bob', '07101', 'projectB', '4001',1],
> ['Bob', '07140', 'projectC', '3001',3],
> ['Bob', '07099', 'projectD', '3001',2],
> ['Bob', '07129', 'projectA', '4001',4],
> ['Bob', '07099', 'projectD', '4001',3],
> ['Bob', '07129', 'projectA', '4001',2]
> ]
>
> import itertools, operator
> for k, g in itertools.groupby(sorted(data), key=operator.itemgetter(0,
> 1, 2, 3)):
>   print k, sum(item[4] for item in g)
>

I'm trying to understand what's going on in the for statement but I'm having
troubles. The interpreter is telling me that itemgetter expects 1 argument
and is getting 4.

I understand that groupby takes 2 parameters the first being the sorted
list. The second is a key and this is where I'm confused. The itemgetter
function is going to return a tuple of functions (f[0],f[1],f[2],f[3]).

Should I only be calling itemgetter with whatever element (0 to 3) that I
want to group the items by?

I'm almost getting this but not quite. ;)

RQ
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-28 Thread Kent Johnson

Michael Langford wrote:
> What you want is a set of entries.

Not really; he wants to aggregate entries.

> # remove duplicate entries
> #
> #  myEntries is a list of lists,
> #such as [[1,2,3],[1,2,"foo"],[1,2,3]]
> #
> s=set()
> [s.add(tuple(x)) for x in myEntries]

A set can be constructed directly from a sequence so this can be written as
  s=set(tuple(x) for x in myEntries)

BTW I personally think it is bad style to use a list comprehension just 
for the side effect of iteration, IMO it is clearer to write out the 
loop when you want a loop.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-27 Thread Michael Langford

What you want is a set of entries. Unfortunately, python lists are not
"hashable" which means you have to convert them to something hashable
before you can use the python set datatype.

What you'd like to do is add each to a set while converting them to a
tuple, then convert them back out of the set. In python that is:

#
# remove duplicate entries
#
#  myEntries is a list of lists,
#such as [[1,2,3],[1,2,"foo"],[1,2,3]]
#
s=set()
[s.add(tuple(x)) for x in myEntries]
myEntries = [list(x) for x in list(s)]

List completions are useful for all sorts of list work, this included.

Do not use a database, that would be very ugly and time consuming too.

This is cleaner than the dict keys approach, as you'd *also* have to
convert to tuples for that.

If you need this in non-list completion form, I'd be happy to write
one if that's clearer to you on what's happening.

  --Michael
-- 
Michael Langford
Phone: 404-386-0495
Consulting: http://www.RowdyLabs.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-27 Thread Kent Johnson

bob gailer wrote:
> 2 - Sort the list. Create a new list with an entry for the first name, 
> project, workcode. Step thru the list. Each time the name, project, 
> workcode is the same, accumulate hours. When any of those change, create 
> a list entry for the next name, project, workcode and again start 
> accumulating hours.

This is a two-liner using itertools.groupby() and operator.itemgetter:

data = [['Bob', '07129', 'projectA', '4001',5],
['Bob', '07129', 'projectA', '5001',2],
['Bob', '07101', 'projectB', '4001',1],
['Bob', '07140', 'projectC', '3001',3],
['Bob', '07099', 'projectD', '3001',2],
['Bob', '07129', 'projectA', '4001',4],
['Bob', '07099', 'projectD', '4001',3],
['Bob', '07129', 'projectA', '4001',2]
]

import itertools, operator
for k, g in itertools.groupby(sorted(data), key=operator.itemgetter(0, 
1, 2, 3)):
   print k, sum(item[4] for item in g)

For some explanation see my recent post:
http://mail.python.org/pipermail/tutor/2007-November/058753.html

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-27 Thread bob gailer

Richard Querin wrote:
> I'm trying to process a list and I'm stuck. Hopefully someone can help
> me out here:
>
> I've got a list that is formatted as follows:
> [Name,job#,jobname,workcode,hours]
>
> An example might be:
>
> [Bob,07129,projectA,4001,5]
> [Bob,07129,projectA,5001,2]
> [Bob,07101,projectB,4001,1]
> [Bob,07140,projectC,3001,3]
> [Bob,07099,projectD,3001,2]
> [Bob,07129,projectA,4001,4]
> [Bob,07099,projectD,4001,3]
> [Bob,07129,projectA,4001,2]
>
> Now I'd like to consolidate entries that are duplicates. Duplicates
> meaning entries that share the same Name, job#, jobname and workcode.
> So for the list above, there are 3 entries for projectA which have a
> workcode of 4001. (there is a fourth entry for projectA but it's
> workcode is 5001 and not 4001).
>
> So I'd like to end up with a list so that the three duplicate entries
> are consolidated into one with their hours added up:
>
> [Bob,07129,projectA,4001,11]
> [Bob,07129,projectA,5001,2]
> [Bob,07101,projectB,4001,1]
> [Bob,07140,projectC,3001,3]
> [Bob,07099,projectD,3001,2]
> [Bob,07099,projectD,4001,3]
There are at least 2 more approaches.

1 - Use sqlite (or some other database), insert the data into the 
database, then run a sql statement to sum(hours) group by name, project, 
workcode.

2 - Sort the list. Create a new list with an entry for the first name, 
project, workcode. Step thru the list. Each time the name, project, 
workcode is the same, accumulate hours. When any of those change, create 
a list entry for the next name, project, workcode and again start 
accumulating hours.

The last is IMHO the most straightforward, and easiest to code.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

2007-11-27 Thread John Fouhy

On 28/11/2007, Richard Querin <[EMAIL PROTECTED]> wrote:
> I've got a list that is formatted as follows:
> [Name,job#,jobname,workcode,hours]
[...]
> Now I'd like to consolidate entries that are duplicates. Duplicates
> meaning entries that share the same Name, job#, jobname and workcode.
> So for the list above, there are 3 entries for projectA which have a
> workcode of 4001. (there is a fourth entry for projectA but it's
> workcode is 5001 and not 4001).

You use a dictionary: pull out the jobname and workcode as the dictionary key.


import operator

# if job is an element of the list, then jobKey(job) will be (jobname, workcode)
jobKey = operator.itemgetter(2, 3)

jobList = [...]  # the list of jobs

jobDict = {}

for job in jobList:
  try:
jobDict[jobKey(job)][4] += job[4]
  except KeyError:
jobDict[jobKey(job)] = job

(note that this will modify the jobs in your original list... if this
is Bad, you can replace the last line with "... = job[:]")

HTH!

-- 
John.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] List processing question - consolidating duplicate entries

2007-11-27 Thread Richard Querin

I'm trying to process a list and I'm stuck. Hopefully someone can help
me out here:

I've got a list that is formatted as follows:
[Name,job#,jobname,workcode,hours]

An example might be:

[Bob,07129,projectA,4001,5]
[Bob,07129,projectA,5001,2]
[Bob,07101,projectB,4001,1]
[Bob,07140,projectC,3001,3]
[Bob,07099,projectD,3001,2]
[Bob,07129,projectA,4001,4]
[Bob,07099,projectD,4001,3]
[Bob,07129,projectA,4001,2]

Now I'd like to consolidate entries that are duplicates. Duplicates
meaning entries that share the same Name, job#, jobname and workcode.
So for the list above, there are 3 entries for projectA which have a
workcode of 4001. (there is a fourth entry for projectA but it's
workcode is 5001 and not 4001).

So I'd like to end up with a list so that the three duplicate entries
are consolidated into one with their hours added up:

[Bob,07129,projectA,4001,11]
[Bob,07129,projectA,5001,2]
[Bob,07101,projectB,4001,1]
[Bob,07140,projectC,3001,3]
[Bob,07099,projectD,3001,2]
[Bob,07099,projectD,4001,3]

I've tried doing it with brute force by stepping through each item and
checking all the other items for matches, and then trying to build a
new list as I go, but that's still confusing me - for instance how can
I delete the items that I've already consolidated so they don't get
processed again?.

I'm not a programmer by trade so I'm sorry if this is a basic computer
science question.

RQ
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List processing question - consolidating duplicate entries

Re: [Tutor] List processing question - consolidating duplicate entries

Re: [Tutor] List processing question - consolidating duplicate entries

Re: [Tutor] List processing question - consolidating duplicate entries

Re: [Tutor] List processing question - consolidating duplicate entries

Re: [Tutor] List processing question - consolidating duplicate entries

Re: [Tutor] List processing question - consolidating duplicate entries

[Tutor] List processing question - consolidating duplicate entries

8 matches

Site Navigation

Mail list logo

Footer information