Re: [Pytables-users] Blocked access using PyTables and matrix multiplication

2011-12-12 Thread Brad Buran
Hi Antonio:

Yes, those slides look like what I was looking for.  Thanks for taking
the time to look over the blocking technique I described.  It seems to
work pretty well for the datasets we're dealing with.

Brad

On Sun, Dec 11, 2011 at 5:45 AM, Antonio Valentino
 wrote:
> Hi Brad,
>
> Il 10/12/2011 20:36, Brad Buran ha scritto:
>> I am trying to speed up some analysis routines on arrays that are
>> approximately 16 x 200,000,000 elements (stored in HDF5 arrays that
>> were originally created with PyTables).  I was looking into whether I
>> could speed up the analysis using tricks such as memmap and numexpr;
>> however, since I need to perform row-wise operations (e.g. computing
>> the dot product with a 16x16 array followed by a scipy.signal.filter
>> operation) which requires indexing, I do not believe I can use
>> numexpr.  This leaves "memmaping", but I understand that PyTables
>> offers something similar.  I found a very old discussion on this
>> mailing list 
>> (http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01295.html),
>> but the link Francesc provided to the slides from Euro Scipy
>> describing how to use a blocking technique with PyTables no longer
>> works.  Does anyone have access to the original slides?
>>
>
> Probably material you are looking for is at
>
> http://www.pytables.org/moin/HowToUse#Presentations
>
>> I'm assuming that the blocking technique is as simple as determining a
>> chunk size to operate on and then looping through the PyTables Array,
>> loading the chunk into memory, running np.dot and scipy.signal.filter
>> and saving the result to a new PyTables array, but I was curious to
>> see if the slides point out any subtleties of this approach that I
>> should be aware of.
>>
>> If I understand correctly, the blocking approach is as simple as the 
>> following:
>>
>> # note that diff is a 16x16 array
>> source = f_in.root.data
>> dest = f_in.createCArray('/', 'result', atom=tables.Float32Atom(),
>> size=source.size)
>> temp = np.empty((16, chunksize))
>> for chunk in range(n_chunks):
>>    block = source[:, chunk*chunksize:chunk*chunksize+chunksize]
>>    result = np.dot(diff, block, out=temp)
>>    result = scipy.signal.filtfilt(b, a, result)
>>    dest[:, chunk*chunksize:chunk*chunksize+chunksize] = result
>>
>> Thanks!
>> Brad
>
> Yes, this is the idea.
> Surely Francesc can provide very useful hints about this topic.
> On my part I can suggest you to choose very carefully the chunk shape
> when you generate your datasets.
>
> Best regards
>
> --
> Antonio Valentino
>
> --
> Learn Windows Azure Live!  Tuesday, Dec 13, 2011
> Microsoft is holding a special Learn Windows Azure training event for
> developers. It will provide a great way to learn Windows Azure and what it
> provides. You can attend the event by watching it streamed LIVE online.
> Learn more at http://p.sf.net/sfu/ms-windowsazure
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Blocked access using PyTables and matrix multiplication

2011-12-11 Thread Antonio Valentino
Hi Brad,

Il 10/12/2011 20:36, Brad Buran ha scritto:
> I am trying to speed up some analysis routines on arrays that are
> approximately 16 x 200,000,000 elements (stored in HDF5 arrays that
> were originally created with PyTables).  I was looking into whether I
> could speed up the analysis using tricks such as memmap and numexpr;
> however, since I need to perform row-wise operations (e.g. computing
> the dot product with a 16x16 array followed by a scipy.signal.filter
> operation) which requires indexing, I do not believe I can use
> numexpr.  This leaves "memmaping", but I understand that PyTables
> offers something similar.  I found a very old discussion on this
> mailing list 
> (http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01295.html),
> but the link Francesc provided to the slides from Euro Scipy
> describing how to use a blocking technique with PyTables no longer
> works.  Does anyone have access to the original slides?
> 

Probably material you are looking for is at

http://www.pytables.org/moin/HowToUse#Presentations

> I'm assuming that the blocking technique is as simple as determining a
> chunk size to operate on and then looping through the PyTables Array,
> loading the chunk into memory, running np.dot and scipy.signal.filter
> and saving the result to a new PyTables array, but I was curious to
> see if the slides point out any subtleties of this approach that I
> should be aware of.
> 
> If I understand correctly, the blocking approach is as simple as the 
> following:
> 
> # note that diff is a 16x16 array
> source = f_in.root.data
> dest = f_in.createCArray('/', 'result', atom=tables.Float32Atom(),
> size=source.size)
> temp = np.empty((16, chunksize))
> for chunk in range(n_chunks):
>block = source[:, chunk*chunksize:chunk*chunksize+chunksize]
>result = np.dot(diff, block, out=temp)
>result = scipy.signal.filtfilt(b, a, result)
>dest[:, chunk*chunksize:chunk*chunksize+chunksize] = result
> 
> Thanks!
> Brad

Yes, this is the idea.
Surely Francesc can provide very useful hints about this topic.
On my part I can suggest you to choose very carefully the chunk shape
when you generate your datasets.

Best regards

-- 
Antonio Valentino

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Blocked access using PyTables and matrix multiplication

2011-12-10 Thread Brad Buran
I am trying to speed up some analysis routines on arrays that are
approximately 16 x 200,000,000 elements (stored in HDF5 arrays that
were originally created with PyTables).  I was looking into whether I
could speed up the analysis using tricks such as memmap and numexpr;
however, since I need to perform row-wise operations (e.g. computing
the dot product with a 16x16 array followed by a scipy.signal.filter
operation) which requires indexing, I do not believe I can use
numexpr.  This leaves "memmaping", but I understand that PyTables
offers something similar.  I found a very old discussion on this
mailing list 
(http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01295.html),
but the link Francesc provided to the slides from Euro Scipy
describing how to use a blocking technique with PyTables no longer
works.  Does anyone have access to the original slides?

I'm assuming that the blocking technique is as simple as determining a
chunk size to operate on and then looping through the PyTables Array,
loading the chunk into memory, running np.dot and scipy.signal.filter
and saving the result to a new PyTables array, but I was curious to
see if the slides point out any subtleties of this approach that I
should be aware of.

If I understand correctly, the blocking approach is as simple as the following:

# note that diff is a 16x16 array
source = f_in.root.data
dest = f_in.createCArray('/', 'result', atom=tables.Float32Atom(),
size=source.size)
temp = np.empty((16, chunksize))
for chunk in range(n_chunks):
   block = source[:, chunk*chunksize:chunk*chunksize+chunksize]
   result = np.dot(diff, block, out=temp)
   result = scipy.signal.filtfilt(b, a, result)
   dest[:, chunk*chunksize:chunk*chunksize+chunksize] = result

Thanks!
Brad

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users