Re: [Pytables-users] Re: More on dimension scales

Francesc Altet Thu, 19 Jan 2006 05:11:15 -0800

A Dijous 19 Gener 2006 10:54, [EMAIL PROTECTED] va escriure:
> Wow ! You're perfectly right ! Then I was wrong when I repeatedly asserted
> that the attributes can't be retrieved from a reference, or from an ID
> returned by "H5Rdereference(...)" : please accept my apologies for this.


No problem.

> Thanks to "get_attribute_string_sys", I have been able to successfully
> retrieve the 'CLASS' attribute from a reference, and the attributes
> 'TITLE', 'VERSION', 'FLAVOR' too. Then it will be easy to build the primary
> Pytable object.
> The only attributes I was not able to correctly retrieve are the
> 'REFERENCE_LIST' and 'DIMENSION_LIST' (as they are not strings). But it
> should be easy to work arond this problem.

In fact, you will just need to retrieve the 'CLASS' attribute. With
this, and with some modifications (see later) on the current way that
a primary PyTables object builds, you (and the users) will be able to
get the other attributes right from this primary object.

> I have just a last question. Let's say I build a PyTable object from a
> reference. This new PyTable object (let's call it 'new_object') will have
> the same numarray of values as the original object (let's call it
> 'original_object'), and the same attributes. Now let's say I modify an
> attribute of 'new_object'. Will the attribute of 'original_object' be
> modified ?
> If I'm correct, the answer is "no".
> As far as "system attributes" ('CLASS', 'TITLE', 'FLAVOR'...) are
> concerned, that's not a problem, given that these attributes must not be
> modified. But as far "user attributes" are concerned, this could raise
> problems. So that's why I think 'new_object' shouldn't own only the
> "system_attributes" of 'original_object'.

Well, we have to distinguish here between HDF5 atributes (disk-based)
and python attributes (memory-based). When you modify (or add) a
python attribute in "new_object", then "original_object" will know
nothing about this. However, if you modify an HDF5 attribute in
"new_object", you will modify it *on-disk* and this implies than when
"original_object" is going to read this attribute (using the
AttributeSet instance attached in ._v_attrs), it will read the new
value.

However, and to complicate more the things, when the AttributeSet
class is directed to read an attribute, it will cache it in-memory
(this is made for efficiency reasons). So, if "original_object"
already has an attribute in cache and then "new_object" modifies its
value,and because there is no way for the "new_object" to indicate to
the "original_object" that it has to refresh the cache, both
"new_object" and "original_object" will effectively have *different*
values for this specific attribute.

While this can be a bit disappointing for the user, I don't find it as
being a big issue: it would be enough to properly document this.
Another possibility would be to disable the cache for HDF5 attributes,
but I won't take this step until it is apparent that the
"dis-synchronization" between HDF5 attributes of "native" objects
(i.e. those accessible from the object tree) and "referenced" object
would create bigger difficulties.

Finally, I'd like to give you a couple of recomendations in order to
be able to re-create truly primary PyTables objects from their HDF5
references. First, currently the name takes a key role during the
creation process of actual nodes (see Node.__init__() creator in
Node.py). My suggestion is to add a new parameter to the
Node.__init__() method, called, say, dataset_id, and add logic into it
so that the Node can be created either from a couple of (parentNode,
name) or the new dataset_id. Of course, as you don't have information
about the name of the *referenced* dataset, you should avoid calling
the hook to put the object in the object tree, i.e. avoid the call:

        self._g_setLocation(parentNode, ptname, h5name)

Then, you should go to the hdf5Extension.pyx and allow the initializer
there (Node._g_new) to do the same as in the Python counterpart, that
is, enable the instance to be initialized from a dataset_id, instead
from the duple (where, name). The next is an exemple of what I'm
trying to say:

  def _g_new(self, where, name, dataset_id, init):
    if dataset_id > 0:
      self.dataset_id = dataset_id
      """The dataset ID for this node. In this case it will not have
      name or parent_id attributes."""
    else:
      self.dataset_id = 0
      self.name = strdup(name)
      """The name of this node in its parent group."""
      self.parent_id = where._v_objectID
      """The identifier of the parent group."""

And, finally, you should instruct the open method in *Array objects (
_openArray(self)) to take advantage of self.dataset_id if it already
exists. Something like:

    if self.dataset == 0:
      self.dataset_id = H5Dopen(self.parent_id, self.name)

instead of the current

    self.dataset_id = H5Dopen(self.parent_id, self.name)
    
After this, everything in the logic of writing, reading.. should work
fine, because it will use the self.dataset_id. Perhaps in some (few, I
think) cases, the name would be needed as well, but I do hope that you
will be able to manage this situation in the same way as above.

Well, I know that this may seem a bit cumbersome to somebody that is
still learning about the intricacies of PyTables, but I think that
what I've exposed is feasible, and not too difficult (in the sense
that it doesn't suppose a massive restructuration of the PyTables
logic). If you are feeling strong enough to attack this, then, "bon
courage". Ivan and me will be glad to help you if you get stuck.

A bientôt,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Pytables-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Re: More on dimension scales

Reply via email to