I am working on solving a recent recreational mathematical problem on 
Project Euler <http://projecteuler.net>  . I have a solution, which works
fine for small N up to 10^5 but it takes too long to compute for the actual
problem, where N is of the order 2*10^7. The problem is nested loops, and I
am hoping to avoid one level in the loop by using clever numpy magic (as I
have done often before). However, there is one step, which I cannot figure
out how to do using numpy operations alone, and I am hoping for some help

The subproblem is that I have in principle k = 1, ..., N sets of boolean
arrays
f_k and g_k each of length N.

For each k I need to find the number of elements i where both f_k[i] and
g_k[i] are True and sum that up over all N values of k.

A problem of the order 4*10^14 if I just do it brute force. This takes way
too long (there is a one minute rule).

However, after a lot of thinking and by using some properties of the f_k and
g_k I have managed to construct using only pure numpy function and only a
single loop over k, arrays

f_k_changes_at
g_k_changes_at

which contain the indices i at which the functions change it boolean value
from True to False or False to True.

It so happens that the number of changes is only a small fraction of N, the
fraction decreases with larger N, so the size of these changes_at arrays
contains perhaps only 1000 elements instead of 10000000 for each k, a
significant reduction of complexity.

Now, my problem is to figure out for how many values of i both f_k and g_k
are True given the changes_at arrays.

As this may be a little hard to understand here is a specific example of how
these arrays can look like for k = 2 and N = 150

f_2_changes_at = [  2   3  39  41  58  59  65  66  93 102 145]
g_2_changes_at = [  2  94 101 146 149]

with the boundary condition that f_2[0] = g_2[0] = False

Which expands to
i    f_2    g_2   f_2 and g_2
0     F       F                  F
1     F       F                  F <-
2     T      T                  T <-
3     F      T                   F
4     F      T                   F
...
38   F      T                   F <-
39   T      T                   T
40   T      T                   T <-
41   F      T                   F
42   F      T                   F
...
57   F      T                   F <-
58   T      T                   T <-
59   F      T                   F
60   F      T                   F
...
64   F      T                   F <-
65   T      T                   T <-             
66   F      T                   F
...
92   F      T                   F <-
93   T      T                   T <-
94   T      F                   F
...
100  T     F                   F <- 
101  T     T                   T <-
102  F     T                   F
...
144  F     T                   F <-
145  T     T                   T <-
146  T     F                   F
147  T     F                   F
148  T     F                   F <-
149  T     T                   T <-

With the sum of elements fulfilling the condition being (see arrows)

(2 - 1) + (40 - 38) + (58 - 57) + (65 - 64) + (93 - 92) + (101 - 100) + (145
- 144) + (149 - 148) =
1 + 2 + 1 + 1 + 1 + 1 + 1 + 1 = 9

So, is there a numpy recipe for doing the equivalent process without
expanding it into the full arrays?

I have tried looping over each element in the changes_at arrays and build up
the sums, but that is too inefficient as I then have an inner for loop
containing conditional branching code

Thanks in advance, Slaunger



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/Is-there-a-pure-numpy-recipe-for-this-tp37077.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to