>
>From: FrancescAlted <[email protected]>
>To: Discussion list for PyTables <[email protected]>
>Sent: Wed, March 23, 2011 10:57:06 AM
>Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this
>be
>
>a bug?
>
>
>2011/3/23 Adriano Vilela Barbosa <[email protected]>
>
>This is not a bug, but rather a feature of NumPy. Look at this:
>>
>>>>> import numpy as np
>>>>> a = np.array(['aa\x00\x00'])
>>>>> a[0]
>>'aa' # hey! were have my trailing 0's gone?
>>>>> a.data[:]
>>'aa\x00\x00' # yeah, they still are in the data area of the array
>>
>>I'd recommend you using the byte ('i1') type for achieving what you want:
>>
>>>>> a.view('i1')
>>array([97, 97, 0, 0], dtype=int8)
>>
>>Thank you very much for your explanation, but I still don't get it.
>>
>>Let's forget numpy for a moment and just say I want to store the string
>>'aa\x00\x00' in a CArray. Each element of the CArray is a 4 element string.
>>First, I create the CArray:
>>
>>>>> import tables
>>>>> fid = tables.openFile("carray_test.hdf","w")
>>
>>>>> fid.createGroup("/", 'table', 'Binary table')
>>>>> array_atom = tables.StringAtom(itemsize=4)
>>>>> array_shape = (1,)
>>
>>>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape)
>>
>>Now, I store the string 'aa\x00\x00' in the first row (which is the only row
in
>>this example) of the CArray:
>>
>>>>> fid.root.table.bin_table[0] = 'aa\x00\x00'
>>
>>Now, I do
>>
>>>>> fid.root.table.bin_table[0].data[:]
>>'aa'
>>
>>So, it looks to me that the trailing \x00 elements of the string are not being
>>stored in the CArray. From my side, there's no numpy involved; I'm just trying
>>to store a string. What am I missing?
>>
You cannot avoid NumPy because PyTables uses NumPy behind the scenes as an
intermediate buffer area. What you are seeing is probably a secondary effect
caused by the 'feature' I mentioned before. Any reason why you don't want to
use a byte type instead of a string?
--
>FrancescAlted
>
Hi again,
I'm happy to use bytes instead of strings. The reason I was using strings is
that, as someone new to Python and numpy, I thought strings were the only way
of
dealing with individual bytes. Also, because of this problem I'm having with
strings, I tried storing the numpy arrays directly into the HDF file, but the
performance was quite poorer and the file size quite bigger.
So, going back to my previous example, I guess the only things I need to change
is the Atom object used to construct the CArray and also to use the method
view() instead of tostring() of the numpy array.
>>> import numpy
>>> import tables
>>> fid = tables.openFile("carray_test.hdf","w")
>>> fid.createGroup("/", 'table', 'Binary table')
>>> array_atom = tables.Atom.from_dtype(numpy.dtype((numpy.byte, (4,))))
>>> array_shape = (1,)
>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape)
>>> a = numpy.array(['aa\x00\x00'])
>>> fid.root.table.bin_table[0] = a.view('b')
>>> fid.root.table.bin_table[0].data[:]
'aa\x00\x00'
Is this right, or there's a more efficient way of doing it?
Thank you very much. Your help is greatly appreciated.
Adriano
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users