Dave Angel wrote:
> elsa wrote:
>> Hi guys,
>>
>> I've got a problem with my program, in that the code just takes too
>> long to run. Here's what I'm doing. If anyone has any tips, they'd be
>> much appreciated!
>>
>> So, say I have a list of lists that looks something like this (I'm
>> using a list of lists, rather than a list of tuples, as I need it to
>> be mutable):
>>
>> myList = [[786,0],[45, 1],[673,1],...................[23,46]]
>>
>> there are enough entries in the outer list, that the sum of myList[i]
>> [0] across all i could be as high as 10^7.
>>
>> Now, what I need to do is randomly choose one myList[i], however the
>> distribution of my random choice needs to be proportional to the
>> values of myList[i][0]. So, for this list above, I'd have a much
>> higher chance of choosing myList[0] than myList[1].
>>
>> Here is how I'm doing it at the moment:
>>
>> def chooseI(myList):
>>     mySum=0
>>     choice = random.choice(range(1,sum([i[0] for i in myList])+1))
>>     for i in range(len(myList)):
>>         mySum+=myList[i][0]
>>         if mySum>=choice:
>>             return i
>>             break
>>
>> This works just fine if sum([i[0] for i in myList]) is less than
>> 10,000, say. However if its up around 10^7, the whole thing crashes.
>> Is there a faster way of doing this, that doesn't involve as many
>> computational steps?
>>
>> Thanks!
>>
>> elsa
>>
>>   
> "the whole thing crashes"  -- give us the specific crash you see,
> including the stack trace.  "Crash" is far too vague.
> 
> At first glance you could be running out of memory.  But not with the
> values you're quoting.  If a typical average value for i[0] is 50,
> that's only 200k elements.  Anyway, you could print out the len(myList)
> to see how big the list is.
> 
> There are quicker ways to do the sum, but the first thing to change is
> the random.choice() stuff.  Once you have a sum, you can use
> random.randint() on it directly.  No need to make a range list.
> 
> Another approach might be to build a new list with a running sum in it. 
> Then after doing the randint() on the last (highest) item, you can do a
> bisect.bisect on that list.  The index that returns will be your return
> value.
> 
Your approach seems to  validate an informal impression I had that with
N choices one ought to be able to build a binary tree where each
decision went to left or right according to whether a number drawn from
a linear [0,1) distribution was greater that some arbitrary margin, or not.

There are then arithmetical questions of exactly how to arrive at the
correct threshold values for each bifurcation.  If the most probable
paths are deliberately placed at the top of the tree then the most
frequent cases will be fastest to generate, also.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to