Hi,
This looks useful. What you said about __array__ makes sense, but I didn't
see it in the code you linked.
Do you know when python netcdf4 will support the numpy array interface
directly? I searched around for a roadmap but didn't find anything. It may
be best for me to proceed with a slightly clumsy interface for now and wait
until the array interface is built in for free.

Thanks,
Glenn
On Mar 30, 2014 2:18 AM, "Stephan Hoyer" <sho...@gmail.com> wrote:

> Hi Glenn,
>
> Here is a full example of how we wrap a netCDF4.Variable object,
> implementing all of its ndarray-like methods:
>
> https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429c5/src/xray/conventions.py#L91
>
> The __array__ method would be the most relevant one for you: it means that
> numpy knows how to convert the wrapper array into a numpy.ndarray when you
> call np.mean(cplx_data). More generally, any function that calls
> np.asarray(cplx_data) will properly convert the values, which should
> include most functions from well-written libraries (including numpy and
> scipy). netCDF4.Variable doesn't currently have such an __array__ method,
> but it will in the next released version of the library.
>
> The quick and dirty hack to make all numpy methods work (now going beyond
> what the netCDF4 library implements) would be to add something like the
> following:
>
>     def __getattr__(self, attr):
>         return getattr(np.asarray(self), attr)
>
> But this is a little dangerous, since some methods might silently fail or
> give unpredictable results (e.g., those that modify data). It would be
> safer to list the methods you want to implement explicitly, or to just
> liberally use np.asarray. The later is generally a good practice when
> writing library code, anyways, to catch unusual ndarray subclasses like
> np.matrix.
>
> Stephan
>
>
> On Sat, Mar 29, 2014 at 8:42 PM, G Jones <glenn.calt...@gmail.com> wrote:
>
>> Hi Stephan,
>> Thanks for the reply. I was thinking of something along these lines but
>> was hesitant because while this provides clean access to chunks of the
>> data, you still have to remember to do cplx_data[:].mean() for example in
>> the case that you want cplx_data.mean().
>>
>> I was hoping to basically have all of the ndarray methods at hand without
>> any indexing, but then also being smart about taking advantage of the mmap
>> when possible. But perhaps your solution is the best compromise.
>>
>> Thanks again,
>> Glenn
>> On Mar 29, 2014 10:59 PM, "Stephan Hoyer" <sho...@gmail.com> wrote:
>>
>>> Hi Glenn,
>>>
>>> My usual strategy for this sort of thing is to make a light-weight
>>> wrapper class which reads and converts values when you access them. For
>>> example:
>>>
>>> class WrapComplex(object):
>>>     def __init__(self, nc_var):
>>>         self.nc_var = nc_var
>>>
>>>     def __getitem__(self, item):
>>>         return self.nc_var[item].view('complex')
>>>
>>> nc = netCDF4.Dataset('my.nc')
>>> cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff'])
>>>
>>> Now you can index cplx_data (e.g., cplx_data[:10]) and only the values
>>> you need will be read from disk and converted on the fly.
>>>
>>> Hope this helps!
>>>
>>> Cheers,
>>> Stephan
>>>
>>>
>>>
>>>
>>> On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.calt...@gmail.com>wrote:
>>>
>>>> Hi,
>>>> I am using netCDF4 to store complex data using the recommended strategy
>>>> of creating a compound data type with the real and imaginary parts. This
>>>> all works well, but reading the data into a numpy array is a bit clumsy.
>>>>
>>>> Typically I do:
>>>>
>>>> nc = netCDF4.Dataset('my.nc')
>>>> cplx_data =
>>>> nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
>>>>
>>>> which directly gives a nice complex numpy array. This is OK for small
>>>> arrays, but is wasteful if I only need some chunks of the array because it
>>>> reads all the data in, reducing the utility of the mmap feature of netCDF.
>>>>
>>>> I'm wondering if there is a better way to directly make a numpy array
>>>> view that uses the netcdf variable's memory mapped buffer directly. Looking
>>>> at the Variable class, there is no access to this buffer directly which
>>>> could then be passed to np.ndarray(buffer=...).
>>>>
>>>> Any ideas of simple solutions to this problem?
>>>>
>>>> Thanks,
>>>> Glenn
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to