[Tutor] parsing a "chunked" text file
Hi tutor, I have a large text file that has chunks of data like this: headerA n1 line 1 line 2 ... line n1 headerB n2 line 1 line 2 ... line n2 Where each chunk is a header and the lines that follow it (up to the next header). A header has the number of lines in the chunk as its second field. I would like to turn this file into a dictionary like: dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1, line 2, ... , line n2]} Is there a way to do this with a dictionary comprehension or do I have to iterate over the file with a "while 1" loop? -Drew ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] fast sampling with replacement
On Sat, Feb 20, 2010 at 11:55 AM, Luke Paireepinart wrote: > > > On Sat, Feb 20, 2010 at 1:50 PM, Kent Johnson wrote: > >> On Sat, Feb 20, 2010 at 11:22 AM, Andrew Fithian >> wrote: >> > can >> > you help me speed it up even more? >> > import random >> > def sample_with_replacement(list): >> > l = len(list) # the sample needs to be as long as list >> > r = xrange(l) >> > _random = random.random >> > return [list[int(_random()*l)] for i in r] >> >> You don't have to assign to r, just call xrange() in the list comp. >> You can cache int() as you do with random.random() >> Did you try random.randint(0, l) instead of int(_random()*i) ? >> You shouldn't call your parameter 'list', it hides the builtin list >> and makes the code confusing. >> >> You might want to ask this on comp.lang.python, many more optimization >> gurus there. >> >> Also the function's rather short, it would help to just inline it (esp. > with Kent's modifications, it would basically boil down to a list > comprehension (unless you keep the local ref's to the functions), I hear the > function call overhead is rather high (depending on your usage - if your > lists are huge and you don't call the function that much it might not > matter.) > > The code is taking a list of length n and randomly sampling n items with replacement from the list and then returning the sample. I'm going to try the suggestion to inline the code before I make any of the other (good) suggested changes to the implementation. This function is being called thousands of times per execution, if the "function call overhead" is as high as you say that sounds like the place to start optimizing. Thanks guys. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] fast sampling with replacement
Hi tutor, I'm have a statistical bootstrapping script that is bottlenecking on a python function sample_with_replacement(). I wrote this function myself because I couldn't find a similar function in python's random library. This is the fastest version of the function I could come up with (I used cProfile.run() to time every version I wrote) but it's not fast enough, can you help me speed it up even more? import random def sample_with_replacement(list): l = len(list) # the sample needs to be as long as list r = xrange(l) _random = random.random return [list[int(_random()*l)] for i in r] # using list[int(_random()*l)] is faster than random.choice(list) FWIW, my bootstrapping script is spending roughly half of the run time in sample_with_replacement() much more than any other function or method. Thanks in advance for any advice you can give me. -Drew ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] confidence interval
Hi tutor, I have this code for generating a confidence interval from an array of values: import numpy as np import scipy as sp def mean_confidence_interval(data, confidence=0.95): a = 1.0*np.array(data) n = len(a) m, se = np.mean(a), sp.stats.stderr(a) h = se * sp.stats.t._ppf((1+confidence)/2., n-1) return m, m-h, m+h This works but I feel there's a better and more succinct way to do this. Does anyone know of an existing python package that can do this for me? Thanks, -Drew ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] When are strings interned?
My guess is that Python sees the space and decides it isn't a short string so it doesn't get interned. I got the expected results when I used 'green_ideas' instead of 'green ideas'. -Drew On Wed, Jul 1, 2009 at 6:43 PM, Marc Tompkins wrote: > On Wed, Jul 1, 2009 at 5:29 PM, Robert Berman wrote: >> >> > >>> n = "colourless" >> > >>> o = "colourless" >> > >>> n == o >> > True >> > >>> n is o >> > True >> > >>> p = "green ideas" >> > >>> q = "green ideas" >> > >>> p == q >> > True >> > >>> p is q >> > False >> > >> > Why the difference? >> >> The string p is equal to the string q. The object p is not the object q. >> >> Robert > > Yes, but did you read his first example? That one has me scratching my > head. > -- > www.fsrtechnologies.com > > ___ > Tutor maillist - tu...@python.org > http://mail.python.org/mailman/listinfo/tutor > > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor