On 04/22/2013 05:32 PM, Blind Anagram wrote:
On 22/04/2013 22:03, Oscar Benjamin wrote:
On 22 April 2013 21:18, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote:
On 22 April 2013 17:38, Blind Anagram <blindanag...@nowhere.org> wrote:
On 22/04/2013 17:06, Oscar Benjamin wrote:

I don't know what your application is but I would say that my first
port of call here would be to consider a different algorithmic
approach. An obvious question would be about the sparsity of this data
structure. How frequent are the values that you are trying to count?
Would it make more sense to store a list of their indices?

Actually it is no more than a simple prime sieve implemented as a Python
class (and, yes, I realize that there are plenty of these around).

If I understand correctly, you have a list of roughly a billion
True/False values indicating which integers are prime and which are
not. You would like to discover how many prime numbers there are
between two numbers a and b. You currently do this by counting the
number of True values in your list between the indices a and b.

If my description is correct then I would definitely consider using a
different algorithmic approach. The density of primes from 1 to 1
billlion is about 5%. Storing the prime numbers themselves in a sorted
list would save memory and allow a potentially more efficient way of
counting the number of primes within some interval.

In fact it is probably quicker if you don't mind using all that memory
to just store the cumulative sum of your prime True/False indicator
list. This would be the prime counting function pi(n). You can then
count the primes between a and b in constant time with pi[b] - pi[a].

I did wonder whether, after creating the sieve, I should simply go
through the list and replace the True values with a count.  This would
certainly speed up the prime count function, which is where the issue
arises.  I will try this and see what sort of performance trade-offs
this involves.


By doing that replacement, you'd increase memory usage manyfold (maybe 3:1, I don't know). As long as you're only using bools in the list, you only have the list overhead to consider, because all the objects involved are already cached (True and False exist only once each). If you have integers, you'll need a new object for each nonzero count.



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to