On 7/5/12 7:59 PM, Anthony Scopatz wrote:
On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett
<[email protected] <mailto:[email protected]>> wrote:
Hello Pytables Users,
I am currently having a maximum number of children error within
pytables. I am trying to store stock updates within hdf5. My
current schema is to have one file represent a trading day, each
table represent a particular instrumentID (stock id) and have each
record in the table belong to a specific update with a timestamp
(where the timestamp could be considered a primary key).
I am currently having all tables be direct descendants of root.
The problem with this is that per day I have the following stats:
#of tables ::= 20000
#of Records per table ::= 250000
The problem persists in that 20000 is too many children to be
associated with a particular node. Continuing with this schema
will consume an exorbitant amount of memory and lead to slower
query times.
Is there a way to redesign this schema so that it could work
better with pytables? Or is this simply too much data?
It certainly isn't too much data. HDF5 scales to petabytes ;)
Would it help to follow with the current schema and just increase
the depth of the tree by taking parts of the instrumentId
(instrumentId is an int64) as nodes?
Yes, this would be one approach that would work.
+1
Basically, nodes in HDF5 only get a fixed amount of storage for
metadata, including what children they have. (I believe this number
is 64 kb. In theory, it is possible to increase this number and
recompile hdf5, but then files generated in this way would only be
compatible with your altered version of the library.) So if a group
has so many children that storing their names and locations takes up
more than 64 kb, you have run out of room. By adding N other
subgroups to the hierarchy you increase the metadata available to N *
64 kb.
No, this is wrong. The hierarchy metadata is stored on a different
place than user metadata, and hence it is not affected by the 64 KB
limit. The problem is rather that having too many children hanging from
a single group affects quite negatively to performance (the same happens
with regular filesystems having directories with too many files).
--
Francesc Alted
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users