Re: [Pytables-users] Blocked access using PyTables and matrix multiplication
Hi Antonio: Yes, those slides look like what I was looking for. Thanks for taking the time to look over the blocking technique I described. It seems to work pretty well for the datasets we're dealing with. Brad On Sun, Dec 11, 2011 at 5:45 AM, Antonio Valentino wrote: > Hi Brad, > > Il 10/12/2011 20:36, Brad Buran ha scritto: >> I am trying to speed up some analysis routines on arrays that are >> approximately 16 x 200,000,000 elements (stored in HDF5 arrays that >> were originally created with PyTables). I was looking into whether I >> could speed up the analysis using tricks such as memmap and numexpr; >> however, since I need to perform row-wise operations (e.g. computing >> the dot product with a 16x16 array followed by a scipy.signal.filter >> operation) which requires indexing, I do not believe I can use >> numexpr. This leaves "memmaping", but I understand that PyTables >> offers something similar. I found a very old discussion on this >> mailing list >> (http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01295.html), >> but the link Francesc provided to the slides from Euro Scipy >> describing how to use a blocking technique with PyTables no longer >> works. Does anyone have access to the original slides? >> > > Probably material you are looking for is at > > http://www.pytables.org/moin/HowToUse#Presentations > >> I'm assuming that the blocking technique is as simple as determining a >> chunk size to operate on and then looping through the PyTables Array, >> loading the chunk into memory, running np.dot and scipy.signal.filter >> and saving the result to a new PyTables array, but I was curious to >> see if the slides point out any subtleties of this approach that I >> should be aware of. >> >> If I understand correctly, the blocking approach is as simple as the >> following: >> >> # note that diff is a 16x16 array >> source = f_in.root.data >> dest = f_in.createCArray('/', 'result', atom=tables.Float32Atom(), >> size=source.size) >> temp = np.empty((16, chunksize)) >> for chunk in range(n_chunks): >> block = source[:, chunk*chunksize:chunk*chunksize+chunksize] >> result = np.dot(diff, block, out=temp) >> result = scipy.signal.filtfilt(b, a, result) >> dest[:, chunk*chunksize:chunk*chunksize+chunksize] = result >> >> Thanks! >> Brad > > Yes, this is the idea. > Surely Francesc can provide very useful hints about this topic. > On my part I can suggest you to choose very carefully the chunk shape > when you generate your datasets. > > Best regards > > -- > Antonio Valentino > > -- > Learn Windows Azure Live! Tuesday, Dec 13, 2011 > Microsoft is holding a special Learn Windows Azure training event for > developers. It will provide a great way to learn Windows Azure and what it > provides. You can attend the event by watching it streamed LIVE online. > Learn more at http://p.sf.net/sfu/ms-windowsazure > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] Blocked access using PyTables and matrix multiplication
Hi Brad, Il 10/12/2011 20:36, Brad Buran ha scritto: > I am trying to speed up some analysis routines on arrays that are > approximately 16 x 200,000,000 elements (stored in HDF5 arrays that > were originally created with PyTables). I was looking into whether I > could speed up the analysis using tricks such as memmap and numexpr; > however, since I need to perform row-wise operations (e.g. computing > the dot product with a 16x16 array followed by a scipy.signal.filter > operation) which requires indexing, I do not believe I can use > numexpr. This leaves "memmaping", but I understand that PyTables > offers something similar. I found a very old discussion on this > mailing list > (http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01295.html), > but the link Francesc provided to the slides from Euro Scipy > describing how to use a blocking technique with PyTables no longer > works. Does anyone have access to the original slides? > Probably material you are looking for is at http://www.pytables.org/moin/HowToUse#Presentations > I'm assuming that the blocking technique is as simple as determining a > chunk size to operate on and then looping through the PyTables Array, > loading the chunk into memory, running np.dot and scipy.signal.filter > and saving the result to a new PyTables array, but I was curious to > see if the slides point out any subtleties of this approach that I > should be aware of. > > If I understand correctly, the blocking approach is as simple as the > following: > > # note that diff is a 16x16 array > source = f_in.root.data > dest = f_in.createCArray('/', 'result', atom=tables.Float32Atom(), > size=source.size) > temp = np.empty((16, chunksize)) > for chunk in range(n_chunks): >block = source[:, chunk*chunksize:chunk*chunksize+chunksize] >result = np.dot(diff, block, out=temp) >result = scipy.signal.filtfilt(b, a, result) >dest[:, chunk*chunksize:chunk*chunksize+chunksize] = result > > Thanks! > Brad Yes, this is the idea. Surely Francesc can provide very useful hints about this topic. On my part I can suggest you to choose very carefully the chunk shape when you generate your datasets. Best regards -- Antonio Valentino -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] Blocked access using PyTables and matrix multiplication
I am trying to speed up some analysis routines on arrays that are approximately 16 x 200,000,000 elements (stored in HDF5 arrays that were originally created with PyTables). I was looking into whether I could speed up the analysis using tricks such as memmap and numexpr; however, since I need to perform row-wise operations (e.g. computing the dot product with a 16x16 array followed by a scipy.signal.filter operation) which requires indexing, I do not believe I can use numexpr. This leaves "memmaping", but I understand that PyTables offers something similar. I found a very old discussion on this mailing list (http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01295.html), but the link Francesc provided to the slides from Euro Scipy describing how to use a blocking technique with PyTables no longer works. Does anyone have access to the original slides? I'm assuming that the blocking technique is as simple as determining a chunk size to operate on and then looping through the PyTables Array, loading the chunk into memory, running np.dot and scipy.signal.filter and saving the result to a new PyTables array, but I was curious to see if the slides point out any subtleties of this approach that I should be aware of. If I understand correctly, the blocking approach is as simple as the following: # note that diff is a 16x16 array source = f_in.root.data dest = f_in.createCArray('/', 'result', atom=tables.Float32Atom(), size=source.size) temp = np.empty((16, chunksize)) for chunk in range(n_chunks): block = source[:, chunk*chunksize:chunk*chunksize+chunksize] result = np.dot(diff, block, out=temp) result = scipy.signal.filtfilt(b, a, result) dest[:, chunk*chunksize:chunk*chunksize+chunksize] = result Thanks! Brad -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users