[Tutor] parsing a "chunked" text file

2010-03-02 Thread Andrew Fithian
Hi tutor,

I have a large text file that has chunks of data like this:

headerA n1
line 1
line 2
...
line n1
headerB n2
line 1
line 2
...
line n2

Where each chunk is a header and the lines that follow it (up to the next
header). A header has the number of lines in the chunk as its second field.

I would like to turn this file into a dictionary like:
dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1, line 2,
... , line n2]}

Is there a way to do this with a dictionary comprehension or do I have to
iterate over the file with a "while 1" loop?

-Drew
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] fast sampling with replacement

2010-02-21 Thread Andrew Fithian
On Sat, Feb 20, 2010 at 11:55 AM, Luke Paireepinart
wrote:

>
>
> On Sat, Feb 20, 2010 at 1:50 PM, Kent Johnson  wrote:
>
>> On Sat, Feb 20, 2010 at 11:22 AM, Andrew Fithian 
>> wrote:
>> >  can
>> > you help me speed it up even more?
>> > import random
>> > def sample_with_replacement(list):
>> > l = len(list) # the sample needs to be as long as list
>> > r = xrange(l)
>> > _random = random.random
>> > return [list[int(_random()*l)] for i in r]
>>
>> You don't have to assign to r, just call xrange() in the list comp.
>> You can cache int() as you do with random.random()
>> Did you try random.randint(0, l) instead of int(_random()*i) ?
>> You shouldn't call your parameter 'list', it hides the builtin list
>> and makes the code confusing.
>>
>> You might want to ask this on comp.lang.python, many more optimization
>> gurus there.
>>
>> Also the function's rather short, it would help to just inline it (esp.
> with Kent's modifications, it would basically boil down to a list
> comprehension (unless you keep the local ref's to the functions), I hear the
> function call overhead is rather high (depending on your usage - if your
> lists are huge and you don't call the function that much it might not
> matter.)
>
> The code is taking a list of length n and randomly sampling n items with
replacement from the list and then returning the sample.

I'm going to try the suggestion to inline the code before I make any of the
other (good) suggested changes to the implementation. This function is being
called thousands of times per execution, if the "function call overhead" is
as high as you say that sounds like the place to start optimizing. Thanks
guys.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] fast sampling with replacement

2010-02-20 Thread Andrew Fithian
Hi tutor,

I'm have a statistical bootstrapping script that is bottlenecking on a
python function sample_with_replacement(). I wrote this function myself
because I couldn't find a similar function in python's random library. This
is the fastest version of the function I could come up with (I used
cProfile.run() to time every version I wrote) but it's not fast enough, can
you help me speed it up even more?

import random
def sample_with_replacement(list):
l = len(list) # the sample needs to be as long as list
r = xrange(l)
_random = random.random
return [list[int(_random()*l)] for i in r] # using
list[int(_random()*l)] is faster than random.choice(list)

FWIW, my bootstrapping script is spending roughly half of the run time in
sample_with_replacement() much more than any other function or method.
Thanks in advance for any advice you can give me.

-Drew
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] confidence interval

2010-01-13 Thread Andrew Fithian
Hi tutor,

I have this code for generating a confidence interval from an array of
values:
  import numpy as np
  import scipy as sp
  def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), sp.stats.stderr(a)
h = se * sp.stats.t._ppf((1+confidence)/2., n-1)
return m, m-h, m+h

This works but I feel there's a better and more succinct way to do this.
Does anyone know of an existing python package that can do this for me?

Thanks,

-Drew
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] When are strings interned?

2009-07-02 Thread Andrew Fithian
My guess is that Python sees the space and decides it isn't a short
string so it doesn't get interned. I got the expected results when I
used 'green_ideas' instead of 'green ideas'.
-Drew



On Wed, Jul 1, 2009 at 6:43 PM, Marc Tompkins wrote:
> On Wed, Jul 1, 2009 at 5:29 PM, Robert Berman  wrote:
>>
>> > >>> n = "colourless"
>> > >>> o = "colourless"
>> > >>> n == o
>> > True
>> > >>> n is o
>> > True
>> > >>> p = "green ideas"
>> > >>> q = "green ideas"
>> > >>> p == q
>> > True
>> > >>> p is q
>> > False
>> >
>> > Why the difference?
>>
>> The string p is equal to the string q. The object p is not the object q.
>>
>> Robert
>
> Yes, but did you read his first example?  That one has me scratching my
> head.
> --
> www.fsrtechnologies.com
>
> ___
> Tutor maillist  -  tu...@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor