Re: [Tutor] Dictionaries and aggregation
Right think I've got the idea now. Thanks for all contributions on this. Paul -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Karl "Pflästerer" Sent: 25 April 2006 22:28 To: tutor@python.org Subject: Re: [Tutor] Dictionaries and aggregation On 25 Apr 2006, [EMAIL PROTECTED] wrote: [...] > Here's the list I'm starting with: > for i in rLst: print i, type(i) > > server001 alive 17.1% 2 requests/s 14805416 total > server001 alive 27.2% 7 requests/s 14851125 total > server002 alive 22.9% 6 requests/s 15173311 total > server002 alive 42.0% 8 requests/s 15147869 total > server003 alive 17.9% 4 requests/s 15220280 total > server003 alive 22.0% 4 requests/s 15260951 total > server004 alive 18.5% 3 requests/s 15484524 total > server004 alive 31.6% 9 requests/s 15429303 total > > I've split each string in the list, extracted what I want and feed it into > an empty dictionary. > rDict ={} i = 0 while i < (len(rLst)): x, y = rLst[i].split()[0], int(rLst[i].split()[3]) rDict[x] = y print x, y, type(x), type(y) i += 1 [...] > What I'm hoping to be able to do is update the value, rather than replace > it, so that it gives me the total i.e. > server001 9 > server003 10 > server002 14 > server004 20 This is easily done. .>>> rdict = {} .>>> for line in lst: ans = line.split() rdict[ans[0]] = rdict.get(ans[0], 0) + int(ans[3]) .>>> rdict .{'server002': 14, 'server003': 8, 'server001': 9, 'server004': 12} Dictionaries have a get() method, which has an optional second argument, which gets returned if there is no such key in the dictionary. Karl -- Please do *not* send copies of replies to me. I read the list ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Dictionaries and aggregation
Kent Johnson writes: >> However here's what I'm now trying to do: >> >> 1) Not have to rely on using awk at all. >> >> >> 2) Create a dictionary with server names for keys e.g. server001, >> server002 etc and the aggregate of the request for that server as the value >> part of the pairing. >> >> >> I got this far with part 1) >> >> lbstat = commands.getoutput("sudo ~/ZLBbalctl --action=cells") >> tmpLst = lbstat.split('\n') >> >> rLst = [] >> for i in tmpLst: >> m = re.search(' server[0-9]+', i) >> if m: >> rLst.append(i) >> >> for i in rLst: >> print i, type(i) >> >> server001 alive 22.3% 6 requests/s 14527762 total >> server002 alive 23.5% 7 requests/s 14833265 total >> server003 alive 38.2%14 requests/s 14872750 total >> server004 alive 15.6% 4 requests/s 15083443 total >> server001 alive 24.1% 8 requests/s 14473672 total >> server002 alive 23.2% 7 requests/s 14810866 total >> server003 alive 30.2% 8 requests/s 14918322 total >> server004 alive 22.1% 6 requests/s 15137847 total >> >> At this point I ran out of ideas and began to think that there must be >> something fundamentally wrong with my approach. Not least of my concerns was >> the fact that I needed integers and these were strings. > > Don't get discouraged, you are on the right track! You had one big > string that included some data you are interested in and some you don't > want, you have converted that to a list of strings containing only the > lines of interest. That is a good first step. Now you have to extract > the data you want out of each line. > > Use line.split() to split the text into fields by whitespace: > In [1]: line = ' server001 alive 22.3% 6 requests/s 14527762 > total' > > In [2]: line.split() > Out[2]: ['server001', 'alive', '22.3%', '6', 'requests/s', '14527762', > 'total'] > > Indexing will pull out the field you want: > In [3]: line.split()[5] > Out[3]: '14527762' > > It's still a string: > In [4]: type(line.split()[5]) > Out[4]: > > Use int() to convert a string to an integer: > In [5]: int(line.split()[5]) > Out[5]: 14527762 > > Then you have to figure out how to accumulate the values in a dictionary > but get this much working first. > > Kent > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > Thanks very much for the steer. I've made a fair bit of progress and look to be within touching distance of getting the problem cracked. Here's the list I'm starting with: >>> for i in rLst: >>> print i, type(i) server001 alive 17.1% 2 requests/s 14805416 total server001 alive 27.2% 7 requests/s 14851125 total server002 alive 22.9% 6 requests/s 15173311 total server002 alive 42.0% 8 requests/s 15147869 total server003 alive 17.9% 4 requests/s 15220280 total server003 alive 22.0% 4 requests/s 15260951 total server004 alive 18.5% 3 requests/s 15484524 total server004 alive 31.6% 9 requests/s 15429303 total I've split each string in the list, extracted what I want and feed it into an empty dictionary. >>> rDict ={} >>> i = 0 >>> while i < (len(rLst)): >>> x, y = rLst[i].split()[0], int(rLst[i].split()[3]) >>> rDict[x] = y >>> print x, y, type(x), type(y) >>> i += 1 server001 4 server001 5 server002 5 server002 9 server003 4 server003 6 server004 8 server004 12 I end up with this. >>> for key, value in rDict.items(): >>> print key, value server001 5 server003 6 server002 9 server004 12 As I understand things this is because the keys must be unique and are being replaced by the final key value pair being feed in from the loop. What I'm hoping to be able to do is update the value, rather than replace it, so that it gives me the total i.e. server001 9 server003 10 server002 14 server004 20 Regards, Paul ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Dictionaries and aggregation
I am trying to create a dictionary using data produced by a load balancing admin tool and aggregate the results. When I invoke the tool from within the shell (‘sudo ~/ZLBbalctl --action="" the following output is produced: Load Balancer 1 usage, over the last 30 seconds Port 80, rules - /(nol)|(ws) server001 alive 18.1% 2 requests/s 14536543 total server002 alive 43.1% 7 requests/s 14842618 total server003 alive 21.2% 2 requests/s 14884487 total server004 alive 17.3% 2 requests/s 15092053 total Load Balancer 2 usage, over the last 30 seconds Port 80, rules - /(nol)|(ws) server001 alive 11.6% 2 requests/s 14482578 total server002 alive 35.6% 9 requests/s 14820991 total server003 alive 28.7% 6 requests/s 14928991 total server004 alive 23.7% 5 requests/s 15147525 total I have managed to get something close to what I’m looking for using lists i.e. the aggregate of the fourth column (requests/s) lbstat = commands.getoutput("sudo ~/ZLBbalctl --action="" | awk '$1 ~ /^server00/ { print $4 }'") rLst = lbstat.split('\n') rLst = [ int(rLst[i]) for i in range(len(rLst)) ] rTotal = reduce(operator.add, rLst) However here’s what I’m now trying to do: 1) Not have to rely on using awk at all. 2) Create a dictionary with server names for keys e.g. server001, server002 etc and the aggregate of the request for that server as the value part of the pairing. I got this far with part 1) lbstat = commands.getoutput("sudo ~/ZLBbalctl --action=""> tmpLst = lbstat.split('\n') rLst = [] for i in tmpLst: m = re.search(' server[0-9]+', i) if m: rLst.append(i) for i in rLst: print i, type(i) server001 alive 22.3% 6 requests/s 14527762 total str'> server002 alive 23.5% 7 requests/s 14833265 total str'> server003 alive 38.2% 14 requests/s 14872750 total str'> server004 alive 15.6% 4 requests/s 15083443 total str'> server001 alive 24.1% 8 requests/s 14473672 total str'> server002 alive 23.2% 7 requests/s 14810866 total str'> server003 alive 30.2% 8 requests/s 14918322 total str'> server004 alive 22.1% 6 requests/s 15137847 total str'> At this point I ran out of ideas and began to think that there must be something fundamentally wrong with my approach. Not least of my concerns was the fact that I needed integers and these were strings. Any help would be much appreciated. Regards, Paul ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor