Let's see if I understood correctly, you want to resolve a O(n^2) problem, where
out = [any(adj(input(i), input(j)) for j in G(input, i)) for i in range(len(input)] (in a very un-pythonic way, I’m cycling on indexes instead of values – just because I think it helps understand better). The problem is that I don't know what G is. N elements adjacent to i in the input vector? (in that case the problem is O(n)). A completely arbitrary, unpredictable function that extracts N elements from the whole input array? (O(n^2)) The following is generic solution that works in the latter, worse case, even in the case where G(input, i) returns a vector as large as input, and allows to solve the problem on an arbitrarily big input (although performance will be hurt). You're trying to solve adj(input(i), input(j)) for all points of a square, each point having coordinates (i,j). You can visualize the input array as being both the vertical and the horizontal edge of the square. Now split the edge in N segments, small enough to fit in your memory; you're thus producing N^2 subsquares. You now have to resolve the problem on all subsquares in turn, loading only one or two at a time into memory. Let's say N=3. Your problem becomes inp1, inp2, inp3 = split(input, 3) out11 = [any(adj(inp1(i), inp1(j)) for j in G(inp1, i)) for i in range(len(inp1)] out12 = [any(adj(inp1(i), inp2(j)) for j in G(inp2, i)) for i in range(len(inp1)] out13 = [any(adj(inp1(i), inp3(j)) for j in G(inp3, i)) for i in range(len(inp1)] out21 = [any(adj(inp2(i), inp1(j)) for j in G(inp1, i)) for i in range(len(inp2)] out22 = [any(adj(inp2(i), inp2(j)) for j in G(inp2, i)) for i in range(len(inp2)] out23 = [any(adj(inp2(i), inp3(j)) for j in G(inp3, i)) for i in range(len(inp2)] out31 = [any(adj(inp3(i), inp1(j)) for j in G(inp1, i)) for i in range(len(inp3)] out32 = [any(adj(inp3(i), inp2(j)) for j in G(inp2, i)) for i in range(len(inp3)] out33 = [any(adj(inp3(i), inp3(j)) for j in G(inp3, i)) for i in range(len(inp3)] Then you have to recombine out. This MAY OR MAY NOT work for you, depending on the math properties of G(). The math property I'm relying on is that I can apply G to a subset of input, recombine the output afterwards and obtain the same result. This may not be the case, e.g. G(input, i) = “the 4 nodes whose value is closest to input(i)”. You can still solve this case, but it adds an additional complication: for every output point, you’ll have to save the input that originated it as well, and recursively apply G to the “local winner” inputs when recombining. As you can see, this solution presents much more memory transfers than the monolithic one and considerably more complex to implement on the python side - this can't be helped. However, you'll mitigate the performance degradation if you load the next buffer while you calculate the current one, e.g. load everything needed to calculate out12 while you’re calculating out11. HTH On 1 Oct 2013 06:13, "Tyler Hardin" <[email protected]> wrote: > Hi all, I'm writing an opencl kernel to find maximal cliques. The kernel > selects a node by the thread gid and finds the max subgraph containing that > node. Unfortunately, around 10000 nodes, I get an out of memory error. The > program is for one of my math professors and tests a conjecture of his, so > I can't post the code in it's entirety, but I'll give a simplified example. > > I've tried cutting down the size of the static arrays in the functions > that have them. Is there anything else I can do other than get a card with > more memory? > > Thanks for your time, > Tyler > > #!/usr/bin/python2.7 > > import pyopencl as cl > import numpy > > def max_clique(nodes_list, resid_list): > resid_list.append(0) > > num_nodes = numpy.int32(len(nodes_list)) > num_resid = numpy.int32(len(resid_list)) > > nodes_arr = numpy.array(nodes_list, dtype='i4') > resid_arr = numpy.array(resid_list, dtype='i4') > > ctx = cl.create_some_context() > queue = cl.CommandQueue(ctx) > > mf = cl.mem_flags > nodes_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, > hostbuf=nodes_arr) > resid_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, > hostbuf=resid_arr) > dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, nodes_arr.nbytes) > > prg = cl.Program(ctx, """ > bool adj(int node1, int node2, __global int *adj) > { > // are node1 and node2 adjacent? > } > > // checks if node is adj to all nodes in chain > bool adj_all(int* nodes, int* G, int chain_len, int node, __global > int* adj){ > // is node adjacent to to all nodes in G? > } > > #define MAX_NODES 64 > int chain(int num_nodes, int* nodes, __global int* connect) > { > int cur[MAX_NODES]; > > // Find max clique in G > } > > __kernel void max_clique(__global int *nodes, > __global int *resid, > __global int *out, > int num_nodes, > int num_resid) > { > int gid = get_global_id(0); > int G[constant]; > // find subgraph around nodes[gid], we call it G, store > // it's nodes in G > out[gid] = // max clique of G; > } > """).build() > > prg.max_clique(queue, nodes_arr.shape, None, nodes_buf, > resid_buf, > dest_buf, > num_nodes, > num_resid) > cliques = numpy.empty_like(nodes_arr) > cl.enqueue_copy(queue, cliques, dest_buf) > return max(cliques) > > print max_clique(range(0, 10000), range(1, 10)) > > _______________________________________________ > PyOpenCL mailing list > [email protected] > http://lists.tiker.net/listinfo/pyopencl > >
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
