Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-15 Thread Michael J. McConachie
@Bob @David -- I gave you all the other parts to give you a background,
and context as it relates to my 'problem'.  My apologies if it seems
obfuscated.  I took an  hour to write that email, and revised it several
times in an attempt to provide good information.  Please disregard my OP.


On 02/14/2013 05:06 PM, bob gailer wrote:
 On 2/14/2013 3:55 PM, Michael McConachie wrote:
 [snip]

 I agree with dave angel - the specification is far from clear. please
 clarify. perhaps a simple example that goes from input to desired output.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-15 Thread Michael J. McConachie
@ Stephen,

Thank you for the answers.  I appreciate your understanding, and
patience; I understand that it was confusing (unintentionally) and
probably irritating to any of the seasoned tutor list members. 

Your examples helped greatly, and was the push I needed.  Happy Friday,
and thanks again,

Mike

On 02/14/2013 05:48 PM, Steven D'Aprano wrote:
 On 15/02/13 07:55, Michael McConachie wrote:

 Essentially:

 1.  I have a list of numbers that already exist in a file.  I
 generate this file by parsing info from logs.
 2.  Each line contains an integer on it (corresponding to the number
 of milliseconds that it takes to complete a certain repeated task).
 3.  There are over a million entries in this file, one per line; at
 any given time it can be just a few thousand, or more than a million.

 Example:
 ---
 173
 1685
 1152
 253
 1623


 A million entries sounds like a lot to you or me, but to your
 computer, it's not. When you start talking tens or hundreds of
 millions, that's possibly a lot.

 Do you know how to read those numbers into a Python list? Here is the
 baby step way to do so:


 data = []  # Start with an empty list.
 f = open(filename)  # Obviously you have to use the actual file name.
 for line in f:  # Read the file one line at a time.
 num = int(line)  # Convert each line into an integer (whole number)
 data.append(num)  # and append it to the end of the list.
 f.close()  # Close the file when done.


 Here's a more concise way to do it:

 with open(filename) as f:
 data = [int(line) for line in f]



 Once you have that list of numbers, you can sum the whole lot:

 sum(data)


 or just a range of the items:

 sum(data[:100])  # The first 100 items.

 sum(data[100:200])  # The second 100 items.

 sum(data[-50:])  # The last 50 items.

 sum(data[1000:])  # Item 1001 to the end.  (See below.)

 sum(data[5:99:3])  # Every third item, starting at index 5 and ending
 at index 98.



 This is called slicing, and it is perhaps the most powerful and
 useful technique that Python gives you for dealing with lists. The
 rules though are not necessarily the most intuitive though.


 A slice is either a pair of numbers separated with a colon, inside the
 square brackets:

 data[start:end]

 or a triple:

 data[start:end:step]

 Any of these three numbers can be left out. The default values are:

 start=0
 end=length of the sequence being sliced
 step=1

 They can also be negative. If start or end are negative, they are
 interpreted as from the end rather than from the beginning.

 Item positions are counted from 0, which will be very familiar to C
 programmers. The start index is included in the slice, the end
 position is excluded.

 The model that you should think of is to imagine the sequence of items
 labelled with their index, starting from zero, and with a vertical
 line *between* each position. Here is a sequence of 26 items, showing
 the index in the first line and the value in the second:


 |0|1|2|3|4|5|6|7|8|9| ... |25|
 |a|b|c|d|e|f|g|h|i|j| ... |z |

 When you take a slice, the items are always cut at the left. So, if
 the above is called letters, we have:

 letters[0:4]  # returns abcd

 letters[2:8]  # returns cdefgh

 letters[2:8:2]  # returns ceg

 letters[-3:]  # returns xyz



 Eventually what I'll need to do is:

 1.  Index the file and/or count the lines, as to identify each line's
 positional relevance so that it can average any range of numbers that
 are sequential; one to one another.


 No need. Python already does that, automatically, when you read the
 data into a list.



 2.  Calculate the difference between any given (x) range.  In order
 to be able to ask the program to average every 5, 10, 100, 100, or
 10,000 etc. --  until completion.  This includes the need to dealing
 with stray remainders at the end of the file that aren't divisible by
 that initial requested range.

 I don't quite understand you here. First you say difference, then
 you say average. Can you show a sample of data, say, 10 values, and
 the sorts of typical calculations you want to perform, with the
 answers you expect to get?


 For example, here's 10 numbers:


 103, 104, 105, 109, 111, 112, 115, 120, 123, 128


 Here are the running averages of 3 values:

 (103+104+105)/3

 (104+105+109)/3

 (105+109+111)/3

 (109+111+112)/3

 (111+112+115)/3

 (112+115+120)/3

 (115+120+123)/3

 (120+123+128)/3


 Is that what you mean? If so, then Python can deal with this
 trivially, using slicing. With your data stored in list data, as
 above, I can say:


 for i in range(0, len(data)-3):  # Stop 3 from the end.
 print sum(data[i:i+3])


 to print the running sums taking three items at a time.



 The rest of your post just confuses me. Until you explain exactly what
 calculations you are trying to perform, I can't tell you how to
 perform them :-)





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or 

Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-15 Thread Albert-Jan Roskam
snip
 Eventually what I'll need to do is:

 
 1.  Index the file and/or count the lines, as to identify each line's 
 positional relevance so that it can average any range of numbers that are 
 sequential; one to one another.

In other words: you would like to down-sample your data? For example, reduce a 
sampling frequency from 1000 samples/second (1KHz) to 100, by averaging every 
ten sequential data points?

 2.  Calculate the difference between any given (x) range.  In order to be 
 able 
 to ask the program to average every 5, 10, 100, 100, or 10,000 etc. -- until 
 completion.  This includes the need to dealing with stray remainders at the 
 end 
 of the file that aren't divisible by that initial requested range. 

In other words: you would like to calculate a running/moving average, with 
window size as a parameter?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-15 Thread Michael J. McConachie
On 02/15/2013 04:03 PM, Albert-Jan Roskam wrote:
 snip
 Eventually what I'll need to do is:
 1.  Index the file and/or count the lines, as to identify each line's 
 positional relevance so that it can average any range of numbers that are 
 sequential; one to one another.
 In other words: you would like to down-sample your data? For example, reduce 
 a sampling frequency from 1000 samples/second (1KHz) to 100, by averaging 
 every ten sequential data points?
I think so.  When I said 'index' in my OP, I wasn't sure how to explain
that each line would be used positionally to identify each group of (x)
among themselves.  (That's all I meant.)  I am trying to identify
gradient(s) in order to determine performance 'thresholds' if they
exist.  We are noting that as the number of tasks (already performed)
increases, a noticeable decrease in the performance of a certain
repeated task exists.  I am trying to determine that point/elbow in the
performance curve.  I have been asked to identify, and plot the overall
'average performance' with varying levels of granularity.  (Averaging
10, by 100, by 1000, etc.)

The file I mentioned in my OP contains the measurement of time it takes
to complete these repeated tasks.  Each entry is on it's own line. The
recorded data is in literal order of completion.  I am averaging those
(ms time entries) in sets of (x) to keep from having to compute the
difference in time for each completed task individually.

  ie:
   Lines 1-10, (11-20, 21-30 -- to completion) are averaged and
read into a list, or hash in order.
  or:
   Lines 1-100, (101-200, 201-300 -- to completion) are averaged
and read into a list, or hash in order.
  or:
   Lines 1-1000, (1001-2000, 2001-3000 -- to completion) are
averaged and read into a list, or hash in order.

   etc, etc.
 2.  Calculate the difference between any given (x) range.  In order to be 
 able 
 to ask the program to average every 5, 10, 100, 100, or 10,000 etc. -- 
 until 
 completion.  This includes the need to dealing with stray remainders at the 
 end 
 of the file that aren't divisible by that initial requested range. 
 In other words: you would like to calculate a running/moving average, with 
 window size as a parameter?
Yes.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-14 Thread Michael McConachie
Hello all,

This is my first post here.  I have tried to get answers from StackOverflow, 
but I realized quickly that I am too green for that environment.  As such, I 
have purchased Beginning Python (2nd edition, Hetland) and also the $29.00 
course available from learnpythonthehardway(dot)com.  I have been reading 
fervently, and have enjoyed python -- very much.  I can do all the basic 
printing, math, substitutions, etc.  Although, I am stuck when trying to 
combine all the new skills I have been learning over the past few weeks.  
Anyway, I was hoping to get some help with something NON-HOMEWORK related. (I 
swear.)  

I have a task that I have generalized due to the nature of what I am trying to 
do -- and it's need to remain confidential.  

My end goal as described on SO was: Calculating and Plotting the Average of 
every (X) items in a list of (Y) total, but for now I am only stuck on the 
actual addition, and/or averaging items -- in a serial sense, based on the 
relation to the previous number, average of numbers, etc being acted on.  Not 
the actual plotting. (Plotting is pretty EZ.)

Essentially:

1.  I have a list of numbers that already exist in a file.  I generate this 
file by parsing info from logs.
2.  Each line contains an integer on it (corresponding to the number of 
milliseconds that it takes to complete a certain repeated task).
3.  There are over a million entries in this file, one per line; at any given 
time it can be just a few thousand, or more than a million.

   Example:
   ---
   173
   1685
   1152
   253
   1623

Eventually what I'll need to do is:

1.  Index the file and/or count the lines, as to identify each line's 
positional relevance so that it can average any range of numbers that are 
sequential; one to one another.
2.  Calculate the difference between any given (x) range.  In order to be able 
to ask the program to average every 5, 10, 100, 100, or 10,000 etc. -- until 
completion.  This includes the need to dealing with stray remainders at the end 
of the file that aren't divisible by that initial requested range. 

(ie: average some file with 3,245 entries by 100 -- not excluding the 
remaining 45 entries, in order to represent the remainder.)

So, looking above, transaction #1 took 173 milliseconds, while transaction #2 
took 1685 milliseconds.  

Based on this, I need to figure out how to do two things:

1.  Calculate the difference of each transaction, related to the one before it 
AND record/capture the difference. (An array, list, dictionary -- I don't 
care.) 
2.  Starting with the very first line/entry, count the first (x number) and 
average (x).  I can obtain a Happy medium for what the gradient/delta is 
between sets of 100 over the course of the aggregate.

   ie:
   ---
   Entries 1-100 = (eventualPlottedAvgTotalA)
   Entries 101-200 = (eventualPlottedAvgTotalB)
   Entries 201-300 = (eventualPlottedAvgTotalC)
   Entries 301-400 = (eventualPlottedAvgTotalD)

From what I can tell, I don't need to indefinitely store the values, only pass 
them as they are processed (in order) to the plotter. I have tried the 
following example to sum a range of 5 entries from the above list of 5 (which 
works), but I don't know how to dynamically pass the 5 at a time until 
completion, all the while retaining the calculated averages which will 
ultimately be passed to pyplot at a later time/date.

What I have been able to figure out thus far is below.  

ex:

   Python 2.7.3 (default, Jul 24 2012, 10:05:38) 
   [GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
   Type help, copyright, credits or license for more information.
plottedTotalA = ['173', '1685', '1152', '253', '1623']
sum(float(t) for t in plottedTotalA)
   4886.0

I received 2 answers from SO, but was unable to fully capture what they were 
trying to tell me.  Unfortunately, I might need a baby-step / Barney-style 
mentor who is willing to guide me on this.  I hope this makes sense to someone 
out there, and thank you in advance for any help that you can provide.  I 
apologize in advance for being so thick if its uber-EZ.

-- Mike
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-14 Thread Dave Angel

On 02/14/2013 03:55 PM, Michael McConachie wrote:

Hello all,

This is my first post here.  I have tried to get answers from StackOverflow, but I 
realized quickly that I am too green for that environment.  As such, I have 
purchased Beginning Python (2nd edition, Hetland) and also the $29.00 course available 
from learnpythonthehardway(dot)com.  I have been reading fervently, and have enjoyed 
python -- very much.  I can do all the basic printing, math, substitutions, etc.  
Although, I am stuck when trying to combine all the new skills I have been learning over 
the past few weeks.  Anyway, I was hoping to get some help with something NON-HOMEWORK 
related. (I swear.)

I have a task that I have generalized due to the nature of what I am trying to 
do -- and it's need to remain confidential.

My end goal as described on SO was: Calculating and Plotting the Average of every 
(X) items in a list of (Y) total, but for now I am only stuck on the actual 
addition, and/or averaging items -- in a serial sense, based on the relation to the 
previous number, average of numbers, etc being acted on.  Not the actual plotting. 
(Plotting is pretty EZ.)



If you're stuck on the addition, why give us all the other parts?  Your 
problem statement is very confused, and you don't show much actual code.



Essentially:

1.  I have a list of numbers that already exist in a file.  I generate this 
file by parsing info from logs.
2.  Each line contains an integer on it (corresponding to the number of 
milliseconds that it takes to complete a certain repeated task).
3.  There are over a million entries in this file, one per line; at any given 
time it can be just a few thousand, or more than a million.

Example:
---
173
1685
1152
253
1623


So write a loop that reads this file into a list of ints, converting 
each line.  Then you can tell us you've got a list of about a million ints.





Eventually what I'll need to do is:

1.  Index the file and/or count the lines, as to identify each line's 
positional relevance so that it can average any range of numbers that are 
sequential; one to one another.
2.  Calculate the difference between any given (x) range.  In order to be able to 
ask the program to average every 5, 10, 100, 100, or 10,000 etc. -- until 
completion.  This includes the need to dealing with stray remainders at the end of 
the file that aren't divisible by that initial requested range.

(ie: average some file with 3,245 entries by 100 -- not excluding the 
remaining 45 entries, in order to represent the remainder.)

So, looking above, transaction #1 took 173 milliseconds, while transaction #2 
took 1685 milliseconds.

Based on this, I need to figure out how to do two things:

1.  Calculate the difference of each transaction, related to the one before it 
AND record/capture the difference. (An array, list, dictionary -- I don't care.)


What difference, what transaction, related how?



2.  Starting with the very first line/entry, count the first (x number) and average (x).  
I can obtain a Happy medium for what the gradient/delta is between sets of 
100 over the course of the aggregate.


What's an x-number?  What, what, which, who ?



ie:
---
Entries 1-100 = (eventualPlottedAvgTotalA)
Entries 101-200 = (eventualPlottedAvgTotalB)
Entries 201-300 = (eventualPlottedAvgTotalC)
Entries 301-400 = (eventualPlottedAvgTotalD)


From what I can tell, I don't need to indefinitely store the values, only pass 
them as they are processed (in order) to the plotter. I have tried the 
following example to sum a range of 5 entries from the above list of 5 (which 
works), but I don't know how to dynamically pass the 5 at a time until 
completion, all the while retaining the calculated averages which will 
ultimately be passed to pyplot at a later time/date.


What I have been able to figure out thus far is below.

ex:

Python 2.7.3 (default, Jul 24 2012, 10:05:38)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type help, copyright, credits or license for more information.
 plottedTotalA = ['173', '1685', '1152', '253', '1623']
 sum(float(t) for t in plottedTotalA)
4886.0

I received 2 answers from SO, but was unable to fully capture what they were trying to tell me.  
Unfortunately, I might need a baby-step / Barney-style mentor who is 
willing to guide me on this.  I hope this makes sense to someone out there, and thank you in 
advance for any help that you can provide.  I apologize in advance for being so thick if its 
uber-EZ.




If you want to make a sublist out of the first 2 items in a list, you 
can use a slice  (notice the colon):


allvalues = [ 173, 1685, 1152, 263, 1623, 19 ]
firsttwo = allvalues[0:2]

To get the 3rd such sublist, use
othertwo = allvalues[4:2]


If you've made such a list, you can readily use sum directly on it:
  mysum = sum(othertwo)



--
DaveA
___
Tutor 

Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-14 Thread Steven D'Aprano

On 15/02/13 07:55, Michael McConachie wrote:


Essentially:

1.  I have a list of numbers that already exist in a file.  I generate this 
file by parsing info from logs.
2.  Each line contains an integer on it (corresponding to the number of 
milliseconds that it takes to complete a certain repeated task).
3.  There are over a million entries in this file, one per line; at any given 
time it can be just a few thousand, or more than a million.

Example:
---
173
1685
1152
253
1623



A million entries sounds like a lot to you or me, but to your computer, it's 
not. When you start talking tens or hundreds of millions, that's possibly a lot.

Do you know how to read those numbers into a Python list? Here is the baby 
step way to do so:


data = []  # Start with an empty list.
f = open(filename)  # Obviously you have to use the actual file name.
for line in f:  # Read the file one line at a time.
num = int(line)  # Convert each line into an integer (whole number)
data.append(num)  # and append it to the end of the list.
f.close()  # Close the file when done.


Here's a more concise way to do it:

with open(filename) as f:
data = [int(line) for line in f]



Once you have that list of numbers, you can sum the whole lot:

sum(data)


or just a range of the items:

sum(data[:100])  # The first 100 items.

sum(data[100:200])  # The second 100 items.

sum(data[-50:])  # The last 50 items.

sum(data[1000:])  # Item 1001 to the end.  (See below.)

sum(data[5:99:3])  # Every third item, starting at index 5 and ending at index 
98.



This is called slicing, and it is perhaps the most powerful and useful 
technique that Python gives you for dealing with lists. The rules though are not 
necessarily the most intuitive though.


A slice is either a pair of numbers separated with a colon, inside the square 
brackets:

data[start:end]

or a triple:

data[start:end:step]

Any of these three numbers can be left out. The default values are:

start=0
end=length of the sequence being sliced
step=1

They can also be negative. If start or end are negative, they are interpreted as from the 
end rather than from the beginning.

Item positions are counted from 0, which will be very familiar to C 
programmers. The start index is included in the slice, the end position is 
excluded.

The model that you should think of is to imagine the sequence of items labelled 
with their index, starting from zero, and with a vertical line *between* each 
position. Here is a sequence of 26 items, showing the index in the first line 
and the value in the second:


|0|1|2|3|4|5|6|7|8|9| ... |25|
|a|b|c|d|e|f|g|h|i|j| ... |z |

When you take a slice, the items are always cut at the left. So, if the above is called 
letters, we have:

letters[0:4]  # returns abcd

letters[2:8]  # returns cdefgh

letters[2:8:2]  # returns ceg

letters[-3:]  # returns xyz




Eventually what I'll need to do is:

1.  Index the file and/or count the lines, as to identify each line's 
positional relevance so that it can average any range of numbers that are 
sequential; one to one another.



No need. Python already does that, automatically, when you read the data into a 
list.




2.  Calculate the difference between any given (x) range.  In order to be able to 
ask the program to average every 5, 10, 100, 100, or 10,000 etc. --  until 
completion.  This includes the need to dealing with stray remainders at the end of 
the file that aren't divisible by that initial requested range.


I don't quite understand you here. First you say difference, then you say 
average. Can you show a sample of data, say, 10 values, and the sorts of typical 
calculations you want to perform, with the answers you expect to get?


For example, here's 10 numbers:


103, 104, 105, 109, 111, 112, 115, 120, 123, 128


Here are the running averages of 3 values:

(103+104+105)/3

(104+105+109)/3

(105+109+111)/3

(109+111+112)/3

(111+112+115)/3

(112+115+120)/3

(115+120+123)/3

(120+123+128)/3


Is that what you mean? If so, then Python can deal with this trivially, using slicing. 
With your data stored in list data, as above, I can say:


for i in range(0, len(data)-3):  # Stop 3 from the end.
print sum(data[i:i+3])


to print the running sums taking three items at a time.



The rest of your post just confuses me. Until you explain exactly what 
calculations you are trying to perform, I can't tell you how to perform them :-)




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie Here -- Averaging Adding Madness Over a Given (x) Range?!?!

2013-02-14 Thread bob gailer

On 2/14/2013 3:55 PM, Michael McConachie wrote:
[snip]

I agree with dave angel - the specification is far from clear. please 
clarify. perhaps a simple example that goes from input to desired output.


--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor