Re: [Tutor] Iterating Lines in File and Export Results

2014-10-03 Thread John Doe
Alan, Peter, et al:

Thank you all very much! Staring at this problem for hours was driving
me crazy and I am very appreciative for your guys' time in looking
into my silly error -- I have thoroughly reviewed both the responses
and it makes perfect sense (*sigh of relief*).



On Thu, Oct 2, 2014 at 6:08 PM, Peter Otten __pete...@web.de wrote:
 John Doe wrote:

 Hello List,
 I am in need of your assistance. I have a text file with random words
 in it. I want to write all the lines to a new file. Additionally, I am
 using Python 2.7 on Ubuntu 12.04:

 Here is my code:

 def loop_extract():
 with open('words.txt', 'r') as f:
 for lines in f:

 The name `lines` is misleading, you are reading one line at a time.

 #print lines (I confirmed that each line is successfully
 #printed)
 with open('export.txt', 'w') as outf:
 outf.write(lines)
 #outf.write(lines)
 #outf.write('{}\n'.format(lines))
 #outf.write('{}\n'.format(line for line in lines))


 For some reason, the second file only contains the last line from the
 original file -- I have tried multiple variations (.read, .readlines,
 .writelines, other examples preceded by comment from above and many
 more) and tried to use the module, fileinput, but I still get the same
 results.

 Every time the line

 with open('export.txt', 'w') as outf:

 is executed the file export.txt is truncated:

 https://docs.python.org/dev/library/functions.html#open

 To avoid the loss of data open the file once, outside the loop:

 with open(words.txt) as infile, open(export.txt, w) as outfile:
 for line in infile:
 outfile.write(line)


 I do understand there is another way to copy the file over, but to
 provide additional background information on my purpose -- I want to
 read a file and save successful regex matches to a file; exporting
 specific data. There doesn't appear to be anything wrong with my
 expression as it prints the expected results without failure. I then
 decided to just write the export function by itself in its basic form,
 per the code above, which the same behavior occurred;

 That is a good approach! Reduce the code until only the source of the
 problem is left.

 only copying the
 last line. I've googled for hours and, unfortunately, at loss.

 I do that too, but not for hours ;)

 I want to read a file and save successful regex matches to a file;
 exporting specific data.

 An experienced user of Python might approach this scenario with a generator:

 def process_lines(infile):
 for line in infile:
 line = process(line) # your line processing
 if meets_condition(line): # your filter condition
 yield line

 with open(words.txt) as infile:
 with open(export.txt, w) as outfile:
 outfile.writelines(
 process_lines(infile))


 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] could somebody please explain...

2014-10-03 Thread Steven D'Aprano
On Wed, Oct 01, 2014 at 09:43:29AM -0700, Clayton Kirkwood wrote:

 In an effort to learn and teach, I present a simple program which measures
 the time it takes to the various add functions with the appending results:

Well done for making the effort! Now I'm going to tell you all the 
things you've done wrong! Sorry.

But seriously, I am very pleased to see you making the effort to develop 
this on your own, but *accurately* timing fast-running code on modern 
computers is very tricky.

The problem is, when you run some code, it isn't the only program 
running! The operating system is running, and these days all computers 
are multi-tasking, which means that anything up to hundreds of other 
programs could be running at the same time. At any one instant, 
most of them will be idle, doing nothing, but there's no way to be 
sure.

Furthermore, there are now complexities with CPU caches. Running a bit 
of code will be much slower the first time, since it is not in the CPU 
cache. If the code it too big, it won't fit in the cache. 

The end result is that when you time how long a piece of code takes to 
run, there will always be two components:

- the actually time taken for your code to run;

- random noise caused by CPU cache effects, other processes running, 
the operating system, your anti-virus suddenly starting a scan in the 
middle of the run, etc.

The noise can be quite considerable, possibly a few seconds. Now 
obviously if your code took ten minutes to run, then a few seconds 
either way is no big deal. But imagine that your timing test 
says that it took 2 seconds. That could mean:

- 0.001 seconds for your code, and 1.999 seconds worth of noise;

- 1.999 seconds for your code, and 0.001 seconds worth of noise;

- or anything in between.

That measurement is clearly quite useless.

Does this mean that timing Python code is impossible? No, not really, 
but you have to do it carefully. The best way is to use Python's 
timeit module, which is carefully crafted to be as accurate as 
possible. First I'll show some results with timeit, then come back for a 
second post where I explain what you can do to be (nearly) as accurate.

I'm going to compare four different ways of adding two numbers:

(1) Using the + operator

(2) Using operator.add

(3) Using operator.__add__

(4) Using a hand-written function, made with lambda


Here's the plus operator: from the command shell, I tell Python to use 
the timeit module to time some code. I give it some setup code to 
initialise two variables, then I time adding them together:

[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 x + y
1000 loops, best of 3: 0.0971 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 x + y
1000 loops, best of 3: 0.0963 usec per loop


So timeit measures how long it takes to run x + y ten million times. 
It does that three times, and picks the fastest of the three. The 
fastest will have the least amount of noise. I ran it twice, and the two 
results are fairly close: 0.0971 microseconds, and 0.0963 microseconds.


[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 -s import operator 
operator.add(x, y)
100 loops, best of 3: 0.369 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 -s import operator 
operator.add(x, y)
100 loops, best of 3: 0.317 usec per loop


This time I use operator.add, and get a speed of about 0.3 microseconds. 
So operator.add is about three times slower than the + operator.


[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 -s import operator 
operator.__add__(x, y)
100 loops, best of 3: 0.296 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 -s import operator 
operator.__add__(x, y)
100 loops, best of 3: 0.383 usec per loop

This time I use operator.__add__, and get about the same result as 
operator.add. You can see the variability in the results: 0.296 to 0.383 
microsecond, that's a variation of about 30%.


[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 -s add = lambda a,b: 
a+b add(x, y)
100 loops, best of 3: 0.296 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s x = 1; y = 2 -s add = lambda a,b: 
a+b add(x, y)
100 loops, best of 3: 0.325 usec per loop

Finally, I try it with a hand-made function using lambda, and I get 
about the same 0.3 microseconds again, with considerable variability.

Of course, the results you get on your computer may be completely 
different.



More to follow...




-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] could somebody please explain...

2014-10-03 Thread Steven D'Aprano
On Wed, Oct 01, 2014 at 09:43:29AM -0700, Clayton Kirkwood wrote:

 # program to test time and count options
 
 import datetime,operator, sys
 from datetime import time, date, datetime
 date = datetime.now()
 dayofweek = date.strftime(%a, %b)
 print(Today is, dayofweek, date.day, at , date.time())
 
 start = 0
 count_max=int(input(give me a number))
 start_time = datetime.now()
 
 print( start_time )
 while start   count_max:
 start=start + 1
 
 end_time = datetime.now()
 print( s=s+1 time difference is:, (end_time - start_time) )


The first problem you have here is that you are not 
actually timing how long it takes to add start + 1. 
You're actually timing eight things:

- lookup the value of start;
- lookup the value of count_max;
- check whether the first is less than the second;
- decide whether to loop, or exit the loop;
- if we're still inside the loop, lookup start again;
- add 1 to it;
- store the result in start; 
- jump back to the top of the loop.


So the results you get don't tell you much about the speed of start+1.

Analogy: you want to know how long it takes you to drive to work in the 
morning. So you wake up, eat breakfast, brush your teeth, start the 
stopwatch, have a shower, get dressed, get in the car, drive to the gas 
station, fill up, buy a newspaper, and drive the rest of the way to 
work, and finally stop the stopwatch. The time you get is neither 
accurate as driving time, nor total time it takes to get to work 
time.

Ideally, we want to do as little extra work as possible inside the 
timing loop, so we can get a figure as close as possible to the time 
actually taken by + as we can.

The second problem is that you are using datetime.now() as your clock. 
That's not a high-precision clock. It might be only be accurate to a 
second, or a millisecond. It certainly isn't accurate enough to measure 
a single addition:

py from datetime import datetime
py x = 1
py t = datetime.now(); x + 1; datetime.now() - t
2
datetime.timedelta(0, 0, 85)


This tells me that it supposedly took 85 microseconds to add two 
numbers, but as I showed before with timeit, the real figure is closer 
to 0.09 microseconds. That's a lot of noise! About 85000% noise!

Unfortunately, it is tricky to know which clock to use. On Windows, 
time.clock() used to be the best one; on Linux, time.time() was the 
best. Starting in Python 3.3, there are a bunch more accurate clocks in 
the time module. But if you use the timeit module, it already picks the 
best clock for the job. But if in doubt, time.time() will normally be 
acceptable.

https://docs.python.org/3/library/time.html

https://docs.python.org/3/library/timeit.html



-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] could somebody please explain...

2014-10-03 Thread Clayton Kirkwood
Steven, I don't disagree with most of your analysis, I didn't know of other
timing routines, and all of the superfluous stuff adds up. However, for a
simple test, the route that I took was adequate I think. Yes I timed the
whole wakeup to get to work, but the important element is that whatever I
timed, was accurate between runs. And that is all that was import: to see
the relative times.I also ran the complete program multiple times and found
the test to be relatively consistent. I appreciate your notice of timeit(),
I'll have to look into that, thanks. Thanks for taking the time to review
and comment.

Clayton

!-Original Message-
!From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On
!Behalf Of Steven D'Aprano
!Sent: Friday, October 03, 2014 6:21 AM
!To: tutor@python.org
!Subject: Re: [Tutor] could somebody please explain...
!
!On Wed, Oct 01, 2014 at 09:43:29AM -0700, Clayton Kirkwood wrote:
!
! # program to test time and count options
!
! import datetime,operator, sys
! from datetime import time, date, datetime date = datetime.now()
! dayofweek = date.strftime(%a, %b) print(Today is, dayofweek,
! date.day, at , date.time())
!
! start = 0
! count_max=int(input(give me a number)) start_time = datetime.now()
!
! print( start_time )
! while start   count_max:
! start=start + 1
!
! end_time = datetime.now()
! print( s=s+1 time difference is:, (end_time - start_time) )
!
!
!The first problem you have here is that you are not actually timing how
!long it takes to add start + 1.
!You're actually timing eight things:
!
!- lookup the value of start;
!- lookup the value of count_max;
!- check whether the first is less than the second;
!- decide whether to loop, or exit the loop;
!- if we're still inside the loop, lookup start again;
!- add 1 to it;
!- store the result in start;
!- jump back to the top of the loop.
!
!
!So the results you get don't tell you much about the speed of start+1.
!
!Analogy: you want to know how long it takes you to drive to work in the
!morning. So you wake up, eat breakfast, brush your teeth, start the
!stopwatch, have a shower, get dressed, get in the car, drive to the gas
!station, fill up, buy a newspaper, and drive the rest of the way to
!work, and finally stop the stopwatch. The time you get is neither
!accurate as driving time, nor total time it takes to get to work
!time.
!
!Ideally, we want to do as little extra work as possible inside the
!timing loop, so we can get a figure as close as possible to the time
!actually taken by + as we can.
!
!The second problem is that you are using datetime.now() as your clock.
!That's not a high-precision clock. It might be only be accurate to a
!second, or a millisecond. It certainly isn't accurate enough to measure
!a single addition:
!
!py from datetime import datetime
!py x = 1
!py t = datetime.now(); x + 1; datetime.now() - t
!2
!datetime.timedelta(0, 0, 85)
!
!
!This tells me that it supposedly took 85 microseconds to add two
!numbers, but as I showed before with timeit, the real figure is closer
!to 0.09 microseconds. That's a lot of noise! About 85000% noise!
!
!Unfortunately, it is tricky to know which clock to use. On Windows,
!time.clock() used to be the best one; on Linux, time.time() was the
!best. Starting in Python 3.3, there are a bunch more accurate clocks in
!the time module. But if you use the timeit module, it already picks the
!best clock for the job. But if in doubt, time.time() will normally be
!acceptable.
!
!https://docs.python.org/3/library/time.html
!
!https://docs.python.org/3/library/timeit.html
!
!
!
!--
!Steven
!___
!Tutor maillist  -  Tutor@python.org
!To unsubscribe or change subscription options:
!https://mail.python.org/mailman/listinfo/tutor



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] pygame module

2014-10-03 Thread Rob Ward
i downloaded the 3.4 version of python but there is no matching binary file
for pygame ive tried every 1.9.1 file and still cant import pygame  would
an older version of python work

rob
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pygame module

2014-10-03 Thread Danny Yoo
On Fri, Oct 3, 2014 at 2:27 PM, Rob Ward azzai...@gmail.com wrote:
 i downloaded the 3.4 version of python but there is no matching binary file
 for pygame ive tried every 1.9.1 file and still cant import pygame  would an
 older version of python work


You might have better results contacting the Pygame community for this
question, as you're asking an installation question on a third-party
library.

http://pygame.org/wiki/info


According to their FAQ, Pygame 1.9.2 should support Python 3:

http://www.pygame.org/wiki/FrequentlyAskedQuestions#Does Pygame
work with Python 3?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor