Hello Anthony,

Thanks for the reply! I was confused about the differences between
in-memory objects and file attributes. I was able to add an EntitiyInfo
class as an HDF5 file attribute and the information persisted.

class EntityInfo(object):
    last_update = '2012-06-11T00:00:00Z'

e_info = EntityInfo()
h5file.root.data.dataset1.e_info = e_info

...
h5file.close()

And when I opened the file again I was able to access the e_info object on
the table.

On Mon, Jun 11, 2012 at 3:47 PM, Anthony Scopatz <scop...@gmail.com> wrote:

> On Mon, Jun 11, 2012 at 2:00 PM, Aquil H. Abdullah <
> aquil.abdul...@gmail.com> wrote:
>
>> Hello All,
>>
>> I've recently started using PyTables and I am very excited about it's
>> speed and ease of use for large datasets, however, I have a problem that I
>> have not been able to solve with regards to user defined table attributes.
>>
>> I have a table that contains observations about of entities that can be
>> classified as different types.  The timestamp for the last observation of
>> these entities may be different. For processing, this table I would like to
>> be able to determine the timestamp of the last observation for each of
>> these entities. The problem is easy as long as I know the entity types.
>>  For example:
>>
>> import tables
>> h5file = tables.openFile('data.h5',mode='r+')
>> tbl = h5file.getNode('/series','data1')
>> last_obs = max(x['timestamp'] for x in tbl.where("""entity_type=='e1'"""))
>>
>> However, my problems is that as I read from my source I may not always
>> know the entity type before hand. I was going to add a last_observation
>> attribute to my table, however, I found the link
>> https://github.com/PyTables/PyTables/issues/145, which says that
>> attributes aren't persistent.
>>
>
> Hello Aquil,
>
> This issue only applies to instance attrs on the in-memory object.
>
>
>> So I have two questions:
>>
>> 1. Are there any user-defined attributes that are persistent?
>>
>
> Yes, these are the HDF5 attributes of a node.   You have to access them
> through the "attrs" namespace. To use your example above:
>
> tbl.attrs.last_obs = 42.0
>
> See
> http://pytables.github.com/usersguide/libref.html?highlight=attrs#the-attributeset-class
>  for
> more info.
>
>
>> 2. Does anyone have any other suggestions? Besides separating the
>> entities into separate tables where I could then just do a max on the
>> timestamp field/col?
>>
>
> You could also use numpy.unique() to figure out the entity values and
> then and itertools.groupby() to separate the data out. (groupby might not
> be the fastest thing to do here.)  Or just use the where() method from
> above for each entity.  The point is that you want the unique of the entity
> type column only:
>
> entity_types = np.unique(tbl.cols.entity_type)
>
> Another thing is that if the times are roughly chronological, and entities
> are evenly dispersed, you could probably get away with only reading in the
> end of the table and make things faster:
>
> entity_types = np.unique(tbl.cols.entity_type[-100:])
>
> I hope this helps!
> Be Well
> Anthony
>
>
>> --
>> Aquil H. Abdullah
>> aquil.abdul...@gmail.com
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Aquil H. Abdullah
aquil.abdul...@gmail.com
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to