Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
On 02/15/2013 04:03 PM, Albert-Jan Roskam wrote: > >> Eventually what I'll need to do is: >> 1. Index the file and/or count the lines, as to identify each line's >> positional relevance so that it can average any range of numbers that are >> sequential; one to one another. > In other words: you would like to down-sample your data? For example, reduce > a sampling frequency from 1000 samples/second (1KHz) to 100, by averaging > every ten sequential data points? I think so. When I said 'index' in my OP, I wasn't sure how to explain that each line would be used positionally to identify each group of (x) among themselves. (That's all I meant.) I am trying to identify gradient(s) in order to determine performance 'thresholds' if they exist. We are noting that as the number of tasks (already performed) increases, a noticeable decrease in the performance of a certain repeated task exists. I am trying to determine that point/elbow in the performance curve. I have been asked to identify, and plot the overall 'average performance' with varying levels of granularity. (Averaging 10, by 100, by 1000, etc.) The file I mentioned in my OP contains the measurement of time it takes to complete these repeated tasks. Each entry is on it's own line. The recorded data is in literal order of completion. I am averaging those (ms time entries) in sets of (x) to keep from having to compute the difference in time for each completed task individually. ie: Lines 1-10, (11-20, 21-30 --> to completion) are averaged and read into a list, or hash in order. or: Lines 1-100, (101-200, 201-300 --> to completion) are averaged and read into a list, or hash in order. or: Lines 1-1000, (1001-2000, 2001-3000 --> to completion) are averaged and read into a list, or hash in order. etc, etc. >> 2. Calculate the difference between any given (x) range. In order to be >> able >> to ask the program to average every 5, 10, 100, 100, or 10,000 etc. --> >> until >> completion. This includes the need to dealing with stray remainders at the >> end >> of the file that aren't divisible by that initial requested range. > In other words: you would like to calculate a running/moving average, with > window size as a parameter? Yes. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
> Eventually what I'll need to do is: > > 1. Index the file and/or count the lines, as to identify each line's > positional relevance so that it can average any range of numbers that are > sequential; one to one another. In other words: you would like to down-sample your data? For example, reduce a sampling frequency from 1000 samples/second (1KHz) to 100, by averaging every ten sequential data points? > 2. Calculate the difference between any given (x) range. In order to be > able > to ask the program to average every 5, 10, 100, 100, or 10,000 etc. --> until > completion. This includes the need to dealing with stray remainders at the > end > of the file that aren't divisible by that initial requested range. In other words: you would like to calculate a running/moving average, with window size as a parameter? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
@ Stephen, Thank you for the answers. I appreciate your understanding, and patience; I understand that it was confusing (unintentionally) and probably irritating to any of the seasoned tutor list members. Your examples helped greatly, and was the push I needed. Happy Friday, and thanks again, Mike On 02/14/2013 05:48 PM, Steven D'Aprano wrote: > On 15/02/13 07:55, Michael McConachie wrote: > >> Essentially: >> >> 1. I have a list of numbers that already exist in a file. I >> generate this file by parsing info from logs. >> 2. Each line contains an integer on it (corresponding to the number >> of milliseconds that it takes to complete a certain repeated task). >> 3. There are over a million entries in this file, one per line; at >> any given time it can be just a few thousand, or more than a million. >> >> Example: >> --- >> 173 >> 1685 >> 1152 >> 253 >> 1623 > > > A million entries sounds like a lot to you or me, but to your > computer, it's not. When you start talking tens or hundreds of > millions, that's possibly a lot. > > Do you know how to read those numbers into a Python list? Here is the > "baby step" way to do so: > > > data = [] # Start with an empty list. > f = open("filename") # Obviously you have to use the actual file name. > for line in f: # Read the file one line at a time. > num = int(line) # Convert each line into an integer (whole number) > data.append(num) # and append it to the end of the list. > f.close() # Close the file when done. > > > Here's a more concise way to do it: > > with open("filename") as f: > data = [int(line) for line in f] > > > > Once you have that list of numbers, you can sum the whole lot: > > sum(data) > > > or just a range of the items: > > sum(data[:100]) # The first 100 items. > > sum(data[100:200]) # The second 100 items. > > sum(data[-50:]) # The last 50 items. > > sum(data[1000:]) # Item 1001 to the end. (See below.) > > sum(data[5:99:3]) # Every third item, starting at index 5 and ending > at index 98. > > > > This is called "slicing", and it is perhaps the most powerful and > useful technique that Python gives you for dealing with lists. The > rules though are not necessarily the most intuitive though. > > > A slice is either a pair of numbers separated with a colon, inside the > square brackets: > > data[start:end] > > or a triple: > > data[start:end:step] > > Any of these three numbers can be left out. The default values are: > > start=0 > end=length of the sequence being sliced > step=1 > > They can also be negative. If start or end are negative, they are > interpreted as "from the end" rather than "from the beginning". > > Item positions are counted from 0, which will be very familiar to C > programmers. The start index is included in the slice, the end > position is excluded. > > The model that you should think of is to imagine the sequence of items > labelled with their index, starting from zero, and with a vertical > line *between* each position. Here is a sequence of 26 items, showing > the index in the first line and the value in the second: > > > |0|1|2|3|4|5|6|7|8|9| ... |25| > |a|b|c|d|e|f|g|h|i|j| ... |z | > > When you take a slice, the items are always cut at the left. So, if > the above is called "letters", we have: > > letters[0:4] # returns "abcd" > > letters[2:8] # returns "cdefgh" > > letters[2:8:2] # returns "ceg" > > letters[-3:] # returns "xyz" > > > >> Eventually what I'll need to do is: >> >> 1. Index the file and/or count the lines, as to identify each line's >> positional relevance so that it can average any range of numbers that >> are sequential; one to one another. > > > No need. Python already does that, automatically, when you read the > data into a list. > > > >> 2. Calculate the difference between any given (x) range. In order >> to be able to ask the program to average every 5, 10, 100, 100, or >> 10,000 etc. --> until completion. This includes the need to dealing >> with stray remainders at the end of the file that aren't divisible by >> that initial requested range. > > I don't quite understand you here. First you say "difference", then > you say "average". Can you show a sample of data, say, 10 values, and > the sorts of typical calculations you want to perform, with the > answers you expect to get? > > > For example, here's 10 numbers: > > > 103, 104, 105, 109, 111, 112, 115, 120, 123, 128 > > > Here are the running averages of 3 values: > > (103+104+105)/3 > > (104+105+109)/3 > > (105+109+111)/3 > > (109+111+112)/3 > > (111+112+115)/3 > > (112+115+120)/3 > > (115+120+123)/3 > > (120+123+128)/3 > > > Is that what you mean? If so, then Python can deal with this > trivially, using slicing. With your data stored in list "data", as > above, I can say: > > > for i in range(0, len(data)-3): # Stop 3 from the end. > print sum(data[i:i+3]) > > > to print the running sums taking three items at a time. > > > > The rest of your post just conf
Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
@Bob @David -- I gave you all the other parts to give you a background, and context as it relates to my 'problem'. My apologies if it seems obfuscated. I took an hour to write that email, and revised it several times in an attempt to provide good information. Please disregard my OP. On 02/14/2013 05:06 PM, bob gailer wrote: > On 2/14/2013 3:55 PM, Michael McConachie wrote: > [snip] > > I agree with dave angel - the specification is far from clear. please > clarify. perhaps a simple example that goes from input to desired output. > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
On 2/14/2013 3:55 PM, Michael McConachie wrote: [snip] I agree with dave angel - the specification is far from clear. please clarify. perhaps a simple example that goes from input to desired output. -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
On 15/02/13 07:55, Michael McConachie wrote: Essentially: 1. I have a list of numbers that already exist in a file. I generate this file by parsing info from logs. 2. Each line contains an integer on it (corresponding to the number of milliseconds that it takes to complete a certain repeated task). 3. There are over a million entries in this file, one per line; at any given time it can be just a few thousand, or more than a million. Example: --- 173 1685 1152 253 1623 A million entries sounds like a lot to you or me, but to your computer, it's not. When you start talking tens or hundreds of millions, that's possibly a lot. Do you know how to read those numbers into a Python list? Here is the "baby step" way to do so: data = [] # Start with an empty list. f = open("filename") # Obviously you have to use the actual file name. for line in f: # Read the file one line at a time. num = int(line) # Convert each line into an integer (whole number) data.append(num) # and append it to the end of the list. f.close() # Close the file when done. Here's a more concise way to do it: with open("filename") as f: data = [int(line) for line in f] Once you have that list of numbers, you can sum the whole lot: sum(data) or just a range of the items: sum(data[:100]) # The first 100 items. sum(data[100:200]) # The second 100 items. sum(data[-50:]) # The last 50 items. sum(data[1000:]) # Item 1001 to the end. (See below.) sum(data[5:99:3]) # Every third item, starting at index 5 and ending at index 98. This is called "slicing", and it is perhaps the most powerful and useful technique that Python gives you for dealing with lists. The rules though are not necessarily the most intuitive though. A slice is either a pair of numbers separated with a colon, inside the square brackets: data[start:end] or a triple: data[start:end:step] Any of these three numbers can be left out. The default values are: start=0 end=length of the sequence being sliced step=1 They can also be negative. If start or end are negative, they are interpreted as "from the end" rather than "from the beginning". Item positions are counted from 0, which will be very familiar to C programmers. The start index is included in the slice, the end position is excluded. The model that you should think of is to imagine the sequence of items labelled with their index, starting from zero, and with a vertical line *between* each position. Here is a sequence of 26 items, showing the index in the first line and the value in the second: |0|1|2|3|4|5|6|7|8|9| ... |25| |a|b|c|d|e|f|g|h|i|j| ... |z | When you take a slice, the items are always cut at the left. So, if the above is called "letters", we have: letters[0:4] # returns "abcd" letters[2:8] # returns "cdefgh" letters[2:8:2] # returns "ceg" letters[-3:] # returns "xyz" Eventually what I'll need to do is: 1. Index the file and/or count the lines, as to identify each line's positional relevance so that it can average any range of numbers that are sequential; one to one another. No need. Python already does that, automatically, when you read the data into a list. 2. Calculate the difference between any given (x) range. In order to be able to ask the program to average every 5, 10, 100, 100, or 10,000 etc. --> until completion. This includes the need to dealing with stray remainders at the end of the file that aren't divisible by that initial requested range. I don't quite understand you here. First you say "difference", then you say "average". Can you show a sample of data, say, 10 values, and the sorts of typical calculations you want to perform, with the answers you expect to get? For example, here's 10 numbers: 103, 104, 105, 109, 111, 112, 115, 120, 123, 128 Here are the running averages of 3 values: (103+104+105)/3 (104+105+109)/3 (105+109+111)/3 (109+111+112)/3 (111+112+115)/3 (112+115+120)/3 (115+120+123)/3 (120+123+128)/3 Is that what you mean? If so, then Python can deal with this trivially, using slicing. With your data stored in list "data", as above, I can say: for i in range(0, len(data)-3): # Stop 3 from the end. print sum(data[i:i+3]) to print the running sums taking three items at a time. The rest of your post just confuses me. Until you explain exactly what calculations you are trying to perform, I can't tell you how to perform them :-) -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
On 02/14/2013 03:55 PM, Michael McConachie wrote: Hello all, This is my first post here. I have tried to get answers from StackOverflow, but I realized quickly that I am too "green" for that environment. As such, I have purchased Beginning Python (2nd edition, Hetland) and also the $29.00 course available from learnpythonthehardway(dot)com. I have been reading fervently, and have enjoyed python -- very much. I can do all the basic printing, math, substitutions, etc. Although, I am stuck when trying to combine all the new skills I have been learning over the past few weeks. Anyway, I was hoping to get some help with something NON-HOMEWORK related. (I swear.) I have a task that I have generalized due to the nature of what I am trying to do -- and it's need to remain confidential. My end goal as described on SO was: "Calculating and Plotting the Average of every (X) items in a list of (Y) total", but for now I am only stuck on the actual addition, and/or averaging items -- in a serial sense, based on the relation to the previous number, average of numbers, etc being acted on. Not the actual plotting. (Plotting is pretty EZ.) If you're stuck on the addition, why give us all the other parts? Your problem statement is very confused, and you don't show much actual code. Essentially: 1. I have a list of numbers that already exist in a file. I generate this file by parsing info from logs. 2. Each line contains an integer on it (corresponding to the number of milliseconds that it takes to complete a certain repeated task). 3. There are over a million entries in this file, one per line; at any given time it can be just a few thousand, or more than a million. Example: --- 173 1685 1152 253 1623 So write a loop that reads this file into a list of ints, converting each line. Then you can tell us you've got a list of about a million ints. Eventually what I'll need to do is: 1. Index the file and/or count the lines, as to identify each line's positional relevance so that it can average any range of numbers that are sequential; one to one another. 2. Calculate the difference between any given (x) range. In order to be able to ask the program to average every 5, 10, 100, 100, or 10,000 etc. --> until completion. This includes the need to dealing with stray remainders at the end of the file that aren't divisible by that initial requested range. (ie: average some file with 3,245 entries by 100 --> not excluding the remaining 45 entries, in order to represent the remainder.) So, looking above, transaction #1 took "173" milliseconds, while transaction #2 took 1685 milliseconds. Based on this, I need to figure out how to do two things: 1. Calculate the difference of each transaction, related to the one before it AND record/capture the difference. (An array, list, dictionary -- I don't care.) What difference, what transaction, related how? 2. Starting with the very first line/entry, count the first (x number) and average (x). I can obtain a "Happy medium" for what the gradient/delta is between sets of 100 over the course of the aggregate. What's an x-number? What, what, which, who ? ie: --- Entries 1-100 = (eventualPlottedAvgTotalA) Entries 101-200 = (eventualPlottedAvgTotalB) Entries 201-300 = (eventualPlottedAvgTotalC) Entries 301-400 = (eventualPlottedAvgTotalD) From what I can tell, I don't need to indefinitely store the values, only pass them as they are processed (in order) to the plotter. I have tried the following example to sum a range of 5 entries from the above list of 5 (which works), but I don't know how to dynamically pass the 5 at a time until completion, all the while retaining the calculated averages which will ultimately be passed to pyplot at a later time/date. What I have been able to figure out thus far is below. ex: Python 2.7.3 (default, Jul 24 2012, 10:05:38) [GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> plottedTotalA = ['173', '1685', '1152', '253', '1623'] >>> sum(float(t) for t in plottedTotalA) 4886.0 I received 2 answers from SO, but was unable to fully capture what they were trying to tell me. Unfortunately, I might need a "baby-step" / "Barney-style" mentor who is willing to guide me on this. I hope this makes sense to someone out there, and thank you in advance for any help that you can provide. I apologize in advance for being so thick if its uber-EZ. If you want to make a sublist out of the first 2 items in a list, you can use a slice (notice the colon): allvalues = [ 173, 1685, 1152, 263, 1623, 19 ] firsttwo = allvalues[0:2] To get the 3rd such sublist, use othertwo = allvalues[4:2] If you've made such a list, you can readily use sum directly on it: mysum = sum(othertwo) -- DaveA _
[Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
Hello all, This is my first post here. I have tried to get answers from StackOverflow, but I realized quickly that I am too "green" for that environment. As such, I have purchased Beginning Python (2nd edition, Hetland) and also the $29.00 course available from learnpythonthehardway(dot)com. I have been reading fervently, and have enjoyed python -- very much. I can do all the basic printing, math, substitutions, etc. Although, I am stuck when trying to combine all the new skills I have been learning over the past few weeks. Anyway, I was hoping to get some help with something NON-HOMEWORK related. (I swear.) I have a task that I have generalized due to the nature of what I am trying to do -- and it's need to remain confidential. My end goal as described on SO was: "Calculating and Plotting the Average of every (X) items in a list of (Y) total", but for now I am only stuck on the actual addition, and/or averaging items -- in a serial sense, based on the relation to the previous number, average of numbers, etc being acted on. Not the actual plotting. (Plotting is pretty EZ.) Essentially: 1. I have a list of numbers that already exist in a file. I generate this file by parsing info from logs. 2. Each line contains an integer on it (corresponding to the number of milliseconds that it takes to complete a certain repeated task). 3. There are over a million entries in this file, one per line; at any given time it can be just a few thousand, or more than a million. Example: --- 173 1685 1152 253 1623 Eventually what I'll need to do is: 1. Index the file and/or count the lines, as to identify each line's positional relevance so that it can average any range of numbers that are sequential; one to one another. 2. Calculate the difference between any given (x) range. In order to be able to ask the program to average every 5, 10, 100, 100, or 10,000 etc. --> until completion. This includes the need to dealing with stray remainders at the end of the file that aren't divisible by that initial requested range. (ie: average some file with 3,245 entries by 100 --> not excluding the remaining 45 entries, in order to represent the remainder.) So, looking above, transaction #1 took "173" milliseconds, while transaction #2 took 1685 milliseconds. Based on this, I need to figure out how to do two things: 1. Calculate the difference of each transaction, related to the one before it AND record/capture the difference. (An array, list, dictionary -- I don't care.) 2. Starting with the very first line/entry, count the first (x number) and average (x). I can obtain a "Happy medium" for what the gradient/delta is between sets of 100 over the course of the aggregate. ie: --- Entries 1-100 = (eventualPlottedAvgTotalA) Entries 101-200 = (eventualPlottedAvgTotalB) Entries 201-300 = (eventualPlottedAvgTotalC) Entries 301-400 = (eventualPlottedAvgTotalD) >From what I can tell, I don't need to indefinitely store the values, only pass >them as they are processed (in order) to the plotter. I have tried the >following example to sum a range of 5 entries from the above list of 5 (which >works), but I don't know how to dynamically pass the 5 at a time until >completion, all the while retaining the calculated averages which will >ultimately be passed to pyplot at a later time/date. What I have been able to figure out thus far is below. ex: Python 2.7.3 (default, Jul 24 2012, 10:05:38) [GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> plottedTotalA = ['173', '1685', '1152', '253', '1623'] >>> sum(float(t) for t in plottedTotalA) 4886.0 I received 2 answers from SO, but was unable to fully capture what they were trying to tell me. Unfortunately, I might need a "baby-step" / "Barney-style" mentor who is willing to guide me on this. I hope this makes sense to someone out there, and thank you in advance for any help that you can provide. I apologize in advance for being so thick if its uber-EZ. -- Mike ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor