I was hoping someone could help me out here.

This is from a post I put up on StackOverflow,

I am have a fairly large dataset that I store in HDF5 and access using
PyTables. One operation I need to do on this dataset are pairwise
comparisons between each of the elements. This requires 2 loops, one to
iterate over each element, and an inner loop to iterate over every other
element. This operation thus looks at N(N-1)/2 comparisons.

For fairly small sets I found it to be faster to dump the contents into a
multdimensional numpy array and then do my iteration. I run into problems
with large sets because of memory issues and need to access each element of
the dataset at run time.

Putting the elements into an array gives me about 600 comparisons per
second, while operating on hdf5 data itself gives me about 300 comparisons
per second.

Is there a way to speed this process up?

Example follows (this is not my real code, just an example):

*Small Set*:

with tb.openFile(h5_file, 'r') as f:
    data = f.root.data

    N_elements = len(data)
    elements = np.empty((N_irises, 1e5))

    for ii, d in enumerate(data):
        elements[ii] = data['element']

D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
    for jj in xrange(ii+1, N_elements):
        D[ii, jj] = compare(elements[ii], elements[jj])

 *Large Set*:

with tb.openFile(h5_file, 'r') as f:
    data = f.root.data

    N_elements = len(data)

    D = np.empty((N_irises, N_irises))
    for ii in xrange(N_elements):
        for jj in xrange(ii+1, N_elements):
             D[ii, jj] = compare(data['element'][ii], data['element'][jj])
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to