[Tutor] pattern matching is too slow

2005-08-12 Thread Vinay Reddy
Hi,
I am writing a front-end for an application (mplayer). I used the
popen2 call to open pipes for bi-directional communication.

I set the output pipe from the application to non-blocking mode using:
fcntl.fcntl(self.mplayerOut, fcntl.F_SETFL, os.O_NONBLOCK)

The problem is that it takes about 10 seconds just to parse through
the inital dump of mplayer (during which mplayer stops playing).

I'm using the following code to read from 'mplayerOut':

while True:
 try:
temp = self.mplayerOut.readline()
   print temp
   if re.compile(^A:).search(temp):
  print abc
 except StandardError:
break

If the remove the re.compile() statement, then the output is
instantaneous and there is no delay. Why is pattern matching so slow?
It's increasing the time almost by 1 second per line of output. How
can I get it to run faster?

Any help will be appreciated.

Regards,
Vinay Reddy

PS: This is my first python program.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] More problems with Learning Python example (fwd)

2005-08-12 Thread Danny Yoo


-- Forwarded message --
Date: Fri, 12 Aug 2005 00:29:00 -0500 (CDT)
From: -Terry- [EMAIL PROTECTED]
To: Danny Yoo [EMAIL PROTECTED]
Subject: Re: [Tutor] More problems with Learning Python example

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Today (Aug 11, 2005) at 7:00pm, Danny Yoo spoke these wise words:

snip

- -Developing this further, it might be a good thing to explicitely write a
- -helper to run a function n times.  If we imagine that we have something
- -like this:
- -
- -##
- -def time_trial(func, num_times):
- -Applies a function func num_times.
- -... # fill me in
- -##
- -
- -
- -then the logic of the block in do_timing() reduces to:
- -
- -
- -for func in funcs:
- -totals[func] = 0.0
- -time_trial(func, num_times)
- -
- -
- -
- -I'm still a little shocked that Learning Python would have such code,
- -though.  Can anyone double check this?  We should be sending errata
- -reports if there are serious bugs like this in the book..
- -
- -
- -If you have any questions on this, please ask questions.  Good luck!

Ok, I struck out on my own here and using your advice,
I came up with the following:

- 
* makezeros.py
- 

def lots_of_appends():
 zeros = []
 for i in range(1):
 zeros.append(0)

def one_multiply():
 zeros = [0] * 1


- 
* my_timings.py
- 

#!/usr/bin/python

import time, makezeros

def time_trial(func, num_times):
total = 0.0
starttime = time.clock()
for num in range(num_times):
apply(func)
stoptime = time.clock()
elapsed = stoptime - starttime
total = total + elapsed
return total

tries = 100
funcs = [makezeros.lots_of_appends, makezeros.one_multiply]

for func in funcs:
took = time_trial(func, tries)
print Running %s %d times took %.3f seconds. % (func.__name__, tries, 
took)

Entering:

python my_timings.py  results.txt

in a shell 10 times I get the following:

Running lots_of_appends 100 times took 0.400 seconds.
Running one_multiply 100 times took 0.020 seconds.
Running lots_of_appends 100 times took 0.460 seconds.
Running one_multiply 100 times took 0.000 seconds.   - ?
Running lots_of_appends 100 times took 0.440 seconds.
Running one_multiply 100 times took 0.010 seconds.
Running lots_of_appends 100 times took 0.390 seconds.
Running one_multiply 100 times took 0.010 seconds.
Running lots_of_appends 100 times took 0.450 seconds.
Running one_multiply 100 times took 0.010 seconds.
Running lots_of_appends 100 times took 0.450 seconds.
Running one_multiply 100 times took 0.010 seconds.
Running lots_of_appends 100 times took 0.440 seconds.
Running one_multiply 100 times took 0.010 seconds.
Running lots_of_appends 100 times took 0.440 seconds.
Running one_multiply 100 times took 0.010 seconds.
Running lots_of_appends 100 times took 0.460 seconds.
Running one_multiply 100 times took 0.020 seconds.
Running lots_of_appends 100 times took 0.410 seconds.
Running one_multiply 100 times took 0.010 seconds.

Is the indicated result a fluke value which I can just
disregard or is there a problem with my code? The 0.000
value shows up about once in every 25-30 runs.

Any other comments?

This whole thing really had me going in circles. Surely
others have run into this problem. Again, many thanks
to Bob and especially to Danny for being so helpful
and spending time helping me.

Sincerely,
- --
 Terry

  ,-~~-.___. Terry Randall tvbareATsocketDOTnet
 / |  ' \
   )0Linux Counter Project User# 98233
 \_/, ,-'
  //
   /  \-'~;/~~~(0)
  /  __/~|   /  |   If only Snoopy had Slackware...
=( __| (|

He is your friend, your partner, your defender, your dog.
You are his life, his love, his leader. He will be yours,
faithful and true, to the last beat of his heart. You owe
it to him to be worthy of such devotion.-- Unknown

  (Best viewed with a mono-spaced font.)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.7 (GNU/Linux)

iD8DBQFC/DOhQvSnsfFzkV0RAoMGAJ0dPAZsnQHraHcTUi/Plm6GFl5z5wCeJude
mS9NbsMjKxViRI0j6NfqsSU=
=P4MJ
-END PGP SIGNATURE-


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Default class in module

2005-08-12 Thread Jan Eden
Hi Kent,

Kent Johnson wrote on 11.08.2005:

I don't know of any way to do exactly what you ask. However you can
use the __init__.py module of the package to promote classes to
package level visibility.


Nice - that's a good start.

Thank you,

Jan
-- 
Common sense is what tells you that the world is flat.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] i want to build my own arabic training corpus data and use the NLTK to deal with

2005-08-12 Thread enas khalil
hi Danny, 
Thanks for this help 
It is now ok to tokenize my text but the next step i want is to use tagger class to tag my text with own tags how can i start this

Also for any NLTK further help is there a specific mailing list i could go on 

many thanks Danny Yoo [EMAIL PROTECTED] wrote:
On Wed, 3 Aug 2005, enas khalil wrote: i want to build my own arabic training corpus data and use the NLTK to parse and make test for unkown dataHi Enas,By NLTK, I'll assume that you mean the Natural Language Toolkit at:http://nltk.sourceforge.net/Have you gone through the introduction and tutorials from the NLTK webpage?http://nltk.sourceforge.net/getting_started.htmlhttp://nltk.sourceforge.net/tutorial/index.html how can i build this file and make it available to treat with it using different NLTK classesYour question is a bit specialized, so we may not be the best people toask about this.The part that you may want to think about is how to break a corpus into asequence of tokens, since tokens are primarily what the NLTK classes workwith.This may
 or may not be immediately easy, depending on how much you cantake advantage of existing NLTK classes. As the documentation in NLTKmentions:"""If we turn to languages other than English, segmenting words can beeven more of a challenge. For example, in Chinese orthography, characterscorrespond to monosyllabic morphemes. Many morphemes are words in theirown right, but many words contain more than one morpheme; most of themconsist of two morphemes. However, there is no visual representation ofword boundaries in Chinese text."""I don't know how Arabic works, so I'm not sure if the caveat above issomething that we need to worry about.There are a few built-in NLTK tokenizers that break a corpus into tokens,including a WhitespaceTokenizer and a RegexpTokenizer class, bothintroduced here:http://nltk.sourceforge.net/tutorial/tokenization/nochunks.htmlFor example:## import
 nltk.token mytext = nltk.token.Token(TEXT="hello world this is a test") mytext##At the moment, this is a single token. We can use a naive approach inbreaking this into words by using whitespace as our delimiter:## import nltk.tokenizer nltk.tokenizer.WhitespaceTokenizer(SUBTOKENS='WORDS').tokenize(mytext) mytext[, , , , , ]##And now our text is broken into a sequence of discrete tokens, where wecan now play with the 'subtokens' of our text:## mytext['WORDS'][, , , , , ] len(mytext['WORDS'])6##If Arabic follows conventions that fit closely with the assumptions ofthose tokenizers, you should be in good shape. Otherwise, you'll probablyhave to do some work !
 to build
 your own customized tokenizers.
		 Start your day with Yahoo! - make it your home page ___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] More problems with Learning Python example

2005-08-12 Thread Alan G
Something odd here...


 def do_timing(num_times, *funcs):
 totals = {}
 for func in funcs:

Here you assign func to each function in turn.

 totals[func] = 0.0

And here you create a key with it

 starttime = time.clock()# record starting time
 for x in range(num_times):
 for func in funcs:

And the same here, but the result will be that when you exit this
loop func will always be the last function.

 totals[func] = totals[func] + elapsed

But you won't have created a key for the last function so
this falls over.

 Traceback (most recent call last):
   File timings.py, line 16, in ?
 do_timing(100, makezeros.lots_of_appends, 
 makezeros.one_multiply)
   File timings.py, line 12, in do_timing
 totals[func] = totals[func] + elapsed
 KeyError: function one_multiply at 0x403eaf0c

I'm not quite sure what the second funcs loop is doing, but thats
the reason for the key error.

HTH,

Alan G. 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] More problems with Learning Python example (fwd)

2005-08-12 Thread Alan G
 Running lots_of_appends 100 times took 0.460 seconds.
 Running one_multiply 100 times took 0.000 seconds.   - ?
 Running lots_of_appends 100 times took 0.440 seconds.
 Running one_multiply 100 times took 0.010 seconds.
 
 Is the indicated result a fluke value which I can just
 disregard or is there a problem with my code? The 0.000
 value shows up about once in every 25-30 runs.

As you can see the multiply function is much faster and 
the 0.000 figure just means the timing was so small it 
didn't register - maybe your PC just happened not to be 
doing anything else at the time so the code was still 
in RAM or somesuch This is a good example of why timing 
tests must be done over many repetitions and averaged.
Since you are running near the limit of recordability 
you might increase the number of loop iterations to 1000...

Alan G.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pattern matching is too slow

2005-08-12 Thread Danny Yoo


On Fri, 12 Aug 2005, Vinay Reddy wrote:

 I'm using the following code to read from 'mplayerOut':

 while True:
  try:
 temp = self.mplayerOut.readline()
print temp
if re.compile(^A:).search(temp):
   print abc
  except StandardError:
 break

 If the remove the re.compile() statement, then the output is
 instantaneous and there is no delay. Why is pattern matching so slow?


Hi Vinay,

Compiling a regular expression object can be expensive.  Doing the
compilation it over and over is probably what's killing the performance
here.

I'd recommend yanking the regular expression compilation out of the inner
loop, and just reuse the regex object after you compile it once.

##
pattern = re.compile(^A:)
while True:
 try:
temp = self.mplayerOut.readline()
   print temp
   if pattern.search(temp):
  print abc
 except StandardError:
break
##


By the way, there are other things in this program that should be fixed.
The way it reads lines from the file is non-idiomatic.  For an example of
what people will usually do to go through a file's lines, see a tutorial
like Alan Gauld's Learning to Program:

http://www.freenetpages.co.uk/hp/alan.gauld/tutfiles.htm


For more details about regular expressions, you may find the Regular
Expression HOWTO guide useful:

http://www.amk.ca/python/howto/regex/

If you have more questions, please feel free to ask.  Good luck!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] i want to build my own arabic training corpus data and use the NLTK to deal with

2005-08-12 Thread Danny Yoo


On Fri, 12 Aug 2005, enas khalil wrote:

 hi Danny, Thanks for this help It is now ok to tokenize my text but the
 next step i want is to use tagger class to tag my text with own tags.
 how can i start this?

Hi Enas,

I'd strongly recommend you really go through the NTLK tutorials: the
developers of NLTK have spent a lot of effort into making an excellent set
of tutorials.  It would be a shame to waste their work.

http://nltk.sourceforge.net/tutorial/index.html

The tutorial on Tagging seems to answer your question affirmatively.


  Also for any NLTK further help is there a specific mailing list i could
 go on

I think you're looking for the nltk forums:

http://sourceforge.net/forum/?group_id=30982

As a warning: again, read through the tutorials first before jumping in
there.  I suspect that many of the people who work with NTLK are
researchers; they may want to see that you've done your homework before
they answer your questions.  In general, the guidelines in:

http://www.catb.org/~esr/faqs/smart-questions.html

will probably apply here.


Good luck to you.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pattern matching is too slow

2005-08-12 Thread Alan G
while True:
 try:
temp = self.mplayerOut.readline()
   print temp
   if re.compile(^A:).search(temp):

The point of re.compile is to compile the re once *outside* the loop.
Compiling the re is slow so you should only do it outside.

As a first step replace re.compile with re.search

   if re.search(^A:,temp):
  print abc
 except StandardError:
break

As a second step move the compile before the loop

reg = re.compile(^A:)

Then inside the loop use the compiled expression

   if reg.search(temp):

The first step should be faster, the second step faster still.

Finally, lookaing at tyour regex you might be better using 
a simple string method - startswith()

   if temp.startswith(A):

That should be even faster still.

HTH,

Alan G
Author of the Learn to Program web tutor
http://www.freenetpages.co.uk/hp/alan.gauld
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] pychecker: x is None or x == None

2005-08-12 Thread Duncan Gibson

We've been programming in Python for about a year. Initially we had a
lot of tests of the form

if x == None:
do_something()

but then someone thought that we should really change these to

if x is None:
do_something()

However. if you run pychecker on these two snippets of code, it
complains about the second, and not the first:

x.py:6: Using is None, may not always work

So the question is, which one should we really be using?
If it is the second, how do I get pychecker to shut up?

I've hunted around in the documentation, and if there is a clear
discussion about this issue, I must have missed it.

Cheers
Duncan
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pychecker: x is None or x == None

2005-08-12 Thread Kent Johnson
Duncan Gibson wrote:
 We've been programming in Python for about a year. Initially we had a
 lot of tests of the form
 
 if x == None:
 do_something()
 
 but then someone thought that we should really change these to
 
 if x is None:
 do_something()
 
 However. if you run pychecker on these two snippets of code, it
 complains about the second, and not the first:
 
 x.py:6: Using is None, may not always work
 
 So the question is, which one should we really be using?
 If it is the second, how do I get pychecker to shut up?

Searching comp.lang.python for 'pychecker is None' finds this discussion:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/a289d565a40fa435/9afaeb22763aadff?q=pychecker+%22is+None%22rnum=1hl=en#9afaeb22763aadff

which says that pychecker is confused by the comparison to a constant and you 
should ignore it.

There is a pychecker test (test_input\test90.py and test_output\test90) which 
shows pychecker ignoring 'is not None' so I think this is a pychecker bug in 
Python 2.4. It is in the bug tracker here:
https://sourceforge.net/tracker/?group_id=24686atid=382217func=detailaid=1227538

The bug references PEP 290 which clearly says that 'is None' is the preferred 
test:
http://www.python.org/peps/pep-0290.html#testing-for-none

I don't see a way to turn off this test but I haven't looked in detail at the 
config options.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] how to run pdb in idle

2005-08-12 Thread 铁石
  I am a newbie to python.I use pdb in command
line quite well. In idle, I open the tree.py file, and type:
pdb.run('tree.py')
 string(1)?()
(Pdb) l
[EOF]
(Pdb) list 10
[EOF]
(Pdb) 
As you see, it didn't work as it was in comand line.
  Can some body tell me why?  
(tree.py was a file download from net,that runs well,I just try to 
get understand how it work.)



[EMAIL PROTECTED]
  2005-08-12

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] how to run pdb in idle

2005-08-12 Thread Alan Gauld

Ìúʯ [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
  I am a newbie to python.I use pdb in command
 line quite well.

pdb is really best used in the normal OS window.

 In idle, I open the tree.py file, and type:
pdb.run('tree.py')
 string(1)?()
 (Pdb) l

While I have used pdb inside IDLE its not great, you are
much better off using the graphical debugger built into
IDLE. Its not quite as powerful as pDB but it is easier
to use! If you are on windows the pythonwin debugger
is better than either pdb or IDLE...

Alan G 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pattern matching is too slow (a little OT)

2005-08-12 Thread Hugo González Monteverde
Hi again =)

That's exactly why I use quiet with MPlayer, because the output is 
simply too much to be parsed, and MPlayer will block waiting for you to 
read the buffer (so it stops playing)

My suggestion is: forget about parsing the huge amounts of output 
MPlayer gives, and use its slave mode instead to know where in the video 
you are)

Vinay Reddy wrote:
 Hi,
 I am writing a front-end for an application (mplayer). I used the
 popen2 call to open pipes for bi-directional communication.
 
 I set the output pipe from the application to non-blocking mode using:
 fcntl.fcntl(self.mplayerOut, fcntl.F_SETFL, os.O_NONBLOCK)
 
 The problem is that it takes about 10 seconds just to parse through
 the inital dump of mplayer (during which mplayer stops playing).
 
 I'm using the following code to read from 'mplayerOut':
 
 while True:
  try:
 temp = self.mplayerOut.readline()
print temp
if re.compile(^A:).search(temp):
   print abc
  except StandardError:
 break
 
 If the remove the re.compile() statement, then the output is
 instantaneous and there is no delay. Why is pattern matching so slow?
 It's increasing the time almost by 1 second per line of output. How
 can I get it to run faster?
 
 Any help will be appreciated.
 
 Regards,
 Vinay Reddy
 
 PS: This is my first python program.
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pattern matching is too slow

2005-08-12 Thread Vinay Reddy
 Reading through other posts, looks like you got somewhare with the
 nonblocking IO. Can you comment on what you did to get it working? The
 whole fcntl thing?

I am able to use non-blocking I/O, but I am unable to get the mplayer
status messages. It's just not there in the mplayer output pipe. I
just posted on the mplayer community regarding this.

To use non-blocking I/O: 
fcntl.fcntl(file descriptor, fcntl.F_SETFL, os.O_NONBLOCK) is
enough. If there's nothing to read from a pipe, instead of blocking,
an exception is generated.

Regards,
Vinay
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Curses example on Linux?

2005-08-12 Thread Alan Gauld

Hossein Movahhedian [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 ky = chr(msvcrt.getch()). The other problem is that when
 the program is finished the previous terminal state is not
 restored (I am using xterm on Linux).

OK, experimenting with the Linux stty command shows that

$ stty echo -nl

will restore the terminal to the proper settings.
I still haven't figured out how to do it from inside
python/curses - that's my next step!

Alan G. 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor