[Tutor] pattern matching is too slow
Hi, I am writing a front-end for an application (mplayer). I used the popen2 call to open pipes for bi-directional communication. I set the output pipe from the application to non-blocking mode using: fcntl.fcntl(self.mplayerOut, fcntl.F_SETFL, os.O_NONBLOCK) The problem is that it takes about 10 seconds just to parse through the inital dump of mplayer (during which mplayer stops playing). I'm using the following code to read from 'mplayerOut': while True: try: temp = self.mplayerOut.readline() print temp if re.compile(^A:).search(temp): print abc except StandardError: break If the remove the re.compile() statement, then the output is instantaneous and there is no delay. Why is pattern matching so slow? It's increasing the time almost by 1 second per line of output. How can I get it to run faster? Any help will be appreciated. Regards, Vinay Reddy PS: This is my first python program. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More problems with Learning Python example (fwd)
-- Forwarded message -- Date: Fri, 12 Aug 2005 00:29:00 -0500 (CDT) From: -Terry- [EMAIL PROTECTED] To: Danny Yoo [EMAIL PROTECTED] Subject: Re: [Tutor] More problems with Learning Python example -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Today (Aug 11, 2005) at 7:00pm, Danny Yoo spoke these wise words: snip - -Developing this further, it might be a good thing to explicitely write a - -helper to run a function n times. If we imagine that we have something - -like this: - - - -## - -def time_trial(func, num_times): - -Applies a function func num_times. - -... # fill me in - -## - - - - - -then the logic of the block in do_timing() reduces to: - - - - - -for func in funcs: - -totals[func] = 0.0 - -time_trial(func, num_times) - - - - - - - -I'm still a little shocked that Learning Python would have such code, - -though. Can anyone double check this? We should be sending errata - -reports if there are serious bugs like this in the book.. - - - - - -If you have any questions on this, please ask questions. Good luck! Ok, I struck out on my own here and using your advice, I came up with the following: - * makezeros.py - def lots_of_appends(): zeros = [] for i in range(1): zeros.append(0) def one_multiply(): zeros = [0] * 1 - * my_timings.py - #!/usr/bin/python import time, makezeros def time_trial(func, num_times): total = 0.0 starttime = time.clock() for num in range(num_times): apply(func) stoptime = time.clock() elapsed = stoptime - starttime total = total + elapsed return total tries = 100 funcs = [makezeros.lots_of_appends, makezeros.one_multiply] for func in funcs: took = time_trial(func, tries) print Running %s %d times took %.3f seconds. % (func.__name__, tries, took) Entering: python my_timings.py results.txt in a shell 10 times I get the following: Running lots_of_appends 100 times took 0.400 seconds. Running one_multiply 100 times took 0.020 seconds. Running lots_of_appends 100 times took 0.460 seconds. Running one_multiply 100 times took 0.000 seconds. - ? Running lots_of_appends 100 times took 0.440 seconds. Running one_multiply 100 times took 0.010 seconds. Running lots_of_appends 100 times took 0.390 seconds. Running one_multiply 100 times took 0.010 seconds. Running lots_of_appends 100 times took 0.450 seconds. Running one_multiply 100 times took 0.010 seconds. Running lots_of_appends 100 times took 0.450 seconds. Running one_multiply 100 times took 0.010 seconds. Running lots_of_appends 100 times took 0.440 seconds. Running one_multiply 100 times took 0.010 seconds. Running lots_of_appends 100 times took 0.440 seconds. Running one_multiply 100 times took 0.010 seconds. Running lots_of_appends 100 times took 0.460 seconds. Running one_multiply 100 times took 0.020 seconds. Running lots_of_appends 100 times took 0.410 seconds. Running one_multiply 100 times took 0.010 seconds. Is the indicated result a fluke value which I can just disregard or is there a problem with my code? The 0.000 value shows up about once in every 25-30 runs. Any other comments? This whole thing really had me going in circles. Surely others have run into this problem. Again, many thanks to Bob and especially to Danny for being so helpful and spending time helping me. Sincerely, - -- Terry ,-~~-.___. Terry Randall tvbareATsocketDOTnet / | ' \ )0Linux Counter Project User# 98233 \_/, ,-' // / \-'~;/~~~(0) / __/~| / | If only Snoopy had Slackware... =( __| (| He is your friend, your partner, your defender, your dog. You are his life, his love, his leader. He will be yours, faithful and true, to the last beat of his heart. You owe it to him to be worthy of such devotion.-- Unknown (Best viewed with a mono-spaced font.) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.7 (GNU/Linux) iD8DBQFC/DOhQvSnsfFzkV0RAoMGAJ0dPAZsnQHraHcTUi/Plm6GFl5z5wCeJude mS9NbsMjKxViRI0j6NfqsSU= =P4MJ -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Default class in module
Hi Kent, Kent Johnson wrote on 11.08.2005: I don't know of any way to do exactly what you ask. However you can use the __init__.py module of the package to promote classes to package level visibility. Nice - that's a good start. Thank you, Jan -- Common sense is what tells you that the world is flat. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] i want to build my own arabic training corpus data and use the NLTK to deal with
hi Danny, Thanks for this help It is now ok to tokenize my text but the next step i want is to use tagger class to tag my text with own tags how can i start this Also for any NLTK further help is there a specific mailing list i could go on many thanks Danny Yoo [EMAIL PROTECTED] wrote: On Wed, 3 Aug 2005, enas khalil wrote: i want to build my own arabic training corpus data and use the NLTK to parse and make test for unkown dataHi Enas,By NLTK, I'll assume that you mean the Natural Language Toolkit at:http://nltk.sourceforge.net/Have you gone through the introduction and tutorials from the NLTK webpage?http://nltk.sourceforge.net/getting_started.htmlhttp://nltk.sourceforge.net/tutorial/index.html how can i build this file and make it available to treat with it using different NLTK classesYour question is a bit specialized, so we may not be the best people toask about this.The part that you may want to think about is how to break a corpus into asequence of tokens, since tokens are primarily what the NLTK classes workwith.
This may or may not be immediately easy, depending on how much you cantake advantage of existing NLTK classes. As the documentation in NLTKmentions:"""If we turn to languages other than English, segmenting words can beeven more of a challenge. For example, in Chinese orthography, characterscorrespond to monosyllabic morphemes. Many morphemes are words in theirown right, but many words contain more than one morpheme; most of themconsist of two morphemes. However, there is no visual representation ofword boundaries in Chinese text."""I don't know how Arabic works, so I'm not sure if the caveat above issomething that we need to worry about.There are a few built-in NLTK tokenizers that break a corpus into tokens,including a WhitespaceTokenizer and a RegexpTokenizer class, bothintroduced here:http://nltk.sourceforge.net/tutorial/tokenization/nochunks.htmlFor example:## import nltk.token mytext = nltk.token.Token(TEXT="hello world this is a test") mytext##At the moment, this is a single token. We can use a naive approach inbreaking this into words by using whitespace as our delimiter:## import nltk.tokenizer nltk.tokenizer.WhitespaceTokenizer(SUBTOKENS='WORDS').tokenize(mytext) mytext[, , , , , ]##And now our text is broken into a sequence of discrete tokens, where wecan now play with the 'subtokens' of our text:## mytext['WORDS'][, , , , , ] len(mytext['WORDS'])6##If Arabic follows conventions that fit closely with the assumptions ofthose tokenizers, you should be in good shape. Otherwise, you'll probablyhave to do some work ! to build your own customized tokenizers. Start your day with Yahoo! - make it your home page ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More problems with Learning Python example
Something odd here... def do_timing(num_times, *funcs): totals = {} for func in funcs: Here you assign func to each function in turn. totals[func] = 0.0 And here you create a key with it starttime = time.clock()# record starting time for x in range(num_times): for func in funcs: And the same here, but the result will be that when you exit this loop func will always be the last function. totals[func] = totals[func] + elapsed But you won't have created a key for the last function so this falls over. Traceback (most recent call last): File timings.py, line 16, in ? do_timing(100, makezeros.lots_of_appends, makezeros.one_multiply) File timings.py, line 12, in do_timing totals[func] = totals[func] + elapsed KeyError: function one_multiply at 0x403eaf0c I'm not quite sure what the second funcs loop is doing, but thats the reason for the key error. HTH, Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More problems with Learning Python example (fwd)
Running lots_of_appends 100 times took 0.460 seconds. Running one_multiply 100 times took 0.000 seconds. - ? Running lots_of_appends 100 times took 0.440 seconds. Running one_multiply 100 times took 0.010 seconds. Is the indicated result a fluke value which I can just disregard or is there a problem with my code? The 0.000 value shows up about once in every 25-30 runs. As you can see the multiply function is much faster and the 0.000 figure just means the timing was so small it didn't register - maybe your PC just happened not to be doing anything else at the time so the code was still in RAM or somesuch This is a good example of why timing tests must be done over many repetitions and averaged. Since you are running near the limit of recordability you might increase the number of loop iterations to 1000... Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pattern matching is too slow
On Fri, 12 Aug 2005, Vinay Reddy wrote: I'm using the following code to read from 'mplayerOut': while True: try: temp = self.mplayerOut.readline() print temp if re.compile(^A:).search(temp): print abc except StandardError: break If the remove the re.compile() statement, then the output is instantaneous and there is no delay. Why is pattern matching so slow? Hi Vinay, Compiling a regular expression object can be expensive. Doing the compilation it over and over is probably what's killing the performance here. I'd recommend yanking the regular expression compilation out of the inner loop, and just reuse the regex object after you compile it once. ## pattern = re.compile(^A:) while True: try: temp = self.mplayerOut.readline() print temp if pattern.search(temp): print abc except StandardError: break ## By the way, there are other things in this program that should be fixed. The way it reads lines from the file is non-idiomatic. For an example of what people will usually do to go through a file's lines, see a tutorial like Alan Gauld's Learning to Program: http://www.freenetpages.co.uk/hp/alan.gauld/tutfiles.htm For more details about regular expressions, you may find the Regular Expression HOWTO guide useful: http://www.amk.ca/python/howto/regex/ If you have more questions, please feel free to ask. Good luck! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] i want to build my own arabic training corpus data and use the NLTK to deal with
On Fri, 12 Aug 2005, enas khalil wrote: hi Danny, Thanks for this help It is now ok to tokenize my text but the next step i want is to use tagger class to tag my text with own tags. how can i start this? Hi Enas, I'd strongly recommend you really go through the NTLK tutorials: the developers of NLTK have spent a lot of effort into making an excellent set of tutorials. It would be a shame to waste their work. http://nltk.sourceforge.net/tutorial/index.html The tutorial on Tagging seems to answer your question affirmatively. Also for any NLTK further help is there a specific mailing list i could go on I think you're looking for the nltk forums: http://sourceforge.net/forum/?group_id=30982 As a warning: again, read through the tutorials first before jumping in there. I suspect that many of the people who work with NTLK are researchers; they may want to see that you've done your homework before they answer your questions. In general, the guidelines in: http://www.catb.org/~esr/faqs/smart-questions.html will probably apply here. Good luck to you. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pattern matching is too slow
while True: try: temp = self.mplayerOut.readline() print temp if re.compile(^A:).search(temp): The point of re.compile is to compile the re once *outside* the loop. Compiling the re is slow so you should only do it outside. As a first step replace re.compile with re.search if re.search(^A:,temp): print abc except StandardError: break As a second step move the compile before the loop reg = re.compile(^A:) Then inside the loop use the compiled expression if reg.search(temp): The first step should be faster, the second step faster still. Finally, lookaing at tyour regex you might be better using a simple string method - startswith() if temp.startswith(A): That should be even faster still. HTH, Alan G Author of the Learn to Program web tutor http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] pychecker: x is None or x == None
We've been programming in Python for about a year. Initially we had a lot of tests of the form if x == None: do_something() but then someone thought that we should really change these to if x is None: do_something() However. if you run pychecker on these two snippets of code, it complains about the second, and not the first: x.py:6: Using is None, may not always work So the question is, which one should we really be using? If it is the second, how do I get pychecker to shut up? I've hunted around in the documentation, and if there is a clear discussion about this issue, I must have missed it. Cheers Duncan ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pychecker: x is None or x == None
Duncan Gibson wrote: We've been programming in Python for about a year. Initially we had a lot of tests of the form if x == None: do_something() but then someone thought that we should really change these to if x is None: do_something() However. if you run pychecker on these two snippets of code, it complains about the second, and not the first: x.py:6: Using is None, may not always work So the question is, which one should we really be using? If it is the second, how do I get pychecker to shut up? Searching comp.lang.python for 'pychecker is None' finds this discussion: http://groups.google.com/group/comp.lang.python/browse_frm/thread/a289d565a40fa435/9afaeb22763aadff?q=pychecker+%22is+None%22rnum=1hl=en#9afaeb22763aadff which says that pychecker is confused by the comparison to a constant and you should ignore it. There is a pychecker test (test_input\test90.py and test_output\test90) which shows pychecker ignoring 'is not None' so I think this is a pychecker bug in Python 2.4. It is in the bug tracker here: https://sourceforge.net/tracker/?group_id=24686atid=382217func=detailaid=1227538 The bug references PEP 290 which clearly says that 'is None' is the preferred test: http://www.python.org/peps/pep-0290.html#testing-for-none I don't see a way to turn off this test but I haven't looked in detail at the config options. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] how to run pdb in idle
I am a newbie to python.I use pdb in command line quite well. In idle, I open the tree.py file, and type: pdb.run('tree.py') string(1)?() (Pdb) l [EOF] (Pdb) list 10 [EOF] (Pdb) As you see, it didn't work as it was in comand line. Can some body tell me why? (tree.py was a file download from net,that runs well,I just try to get understand how it work.) [EMAIL PROTECTED] 2005-08-12 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] how to run pdb in idle
Ìúʯ [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I am a newbie to python.I use pdb in command line quite well. pdb is really best used in the normal OS window. In idle, I open the tree.py file, and type: pdb.run('tree.py') string(1)?() (Pdb) l While I have used pdb inside IDLE its not great, you are much better off using the graphical debugger built into IDLE. Its not quite as powerful as pDB but it is easier to use! If you are on windows the pythonwin debugger is better than either pdb or IDLE... Alan G ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pattern matching is too slow (a little OT)
Hi again =) That's exactly why I use quiet with MPlayer, because the output is simply too much to be parsed, and MPlayer will block waiting for you to read the buffer (so it stops playing) My suggestion is: forget about parsing the huge amounts of output MPlayer gives, and use its slave mode instead to know where in the video you are) Vinay Reddy wrote: Hi, I am writing a front-end for an application (mplayer). I used the popen2 call to open pipes for bi-directional communication. I set the output pipe from the application to non-blocking mode using: fcntl.fcntl(self.mplayerOut, fcntl.F_SETFL, os.O_NONBLOCK) The problem is that it takes about 10 seconds just to parse through the inital dump of mplayer (during which mplayer stops playing). I'm using the following code to read from 'mplayerOut': while True: try: temp = self.mplayerOut.readline() print temp if re.compile(^A:).search(temp): print abc except StandardError: break If the remove the re.compile() statement, then the output is instantaneous and there is no delay. Why is pattern matching so slow? It's increasing the time almost by 1 second per line of output. How can I get it to run faster? Any help will be appreciated. Regards, Vinay Reddy PS: This is my first python program. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pattern matching is too slow
Reading through other posts, looks like you got somewhare with the nonblocking IO. Can you comment on what you did to get it working? The whole fcntl thing? I am able to use non-blocking I/O, but I am unable to get the mplayer status messages. It's just not there in the mplayer output pipe. I just posted on the mplayer community regarding this. To use non-blocking I/O: fcntl.fcntl(file descriptor, fcntl.F_SETFL, os.O_NONBLOCK) is enough. If there's nothing to read from a pipe, instead of blocking, an exception is generated. Regards, Vinay ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Curses example on Linux?
Hossein Movahhedian [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] ky = chr(msvcrt.getch()). The other problem is that when the program is finished the previous terminal state is not restored (I am using xterm on Linux). OK, experimenting with the Linux stty command shows that $ stty echo -nl will restore the terminal to the proper settings. I still haven't figured out how to do it from inside python/curses - that's my next step! Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor