Francesc Altet wrote:
> A Dijous 12 Abril 2007 18:18, Michael Hoffman escrigué:
>> Francesc Altet wrote:
>>> A Dilluns 09 Abril 2007 15:57, Michael Hoffman escrigué:
>>>> As a followup to my previous message, I have realized that I am supposed
>>>> to tune the lustre filesystem for large files. Hopefully that will solve
>>>> my performance problems.
>>> Maybe. A good crosscheck would be to copy the file to a local filesystem
>>> and test the performance. If you still see high latency, please explain
>>> which hierarchy have you endowed to your data and I'll try to provide you
>>> more feedback.
>> Well, I tried that and it was still really slow. So I tried balancing
>> the tree by creating groups named _00 through _ff, from the first octet
>> of the MD5 digest of the dataset name. This afforded a considerable
>> speedup in opening even on a remote filesystem:
>>
>> $ time python -c 'import tables; tables.openFile("original.h5")'
>> Closing remaining opened files... original.h5... done.
>>
>> real 2m25.643s
>> user 0m1.271s
>> sys 0m1.379s
>>
>> $ time python -c 'import tables; tables.openFile("balanced.h5")'
>> Closing remaining opened files... balanced.h5... done.
>>
>> real 0m2.186s
>> user 0m0.158s
>> sys 0m0.106s
>>
>> So perhaps sticking to <4096 nodes per group (or here, <256) is still a
>> good idea. I'm thankful that I don't need to move to multiple files
>> which would have been a real pain. It would be nice if this sort of
>> thing were done automatically but that would probably be best handled
>> upstream in HDF5.
>
> I see. So, in the end the PerformanceWarning that was issued some time ago
> when too many nodes were put in a single group was not a bad idea...
>
> In any case, could you develop further which is your tree structure
> in 'original.h5' and how you changed it for 'balanced.h5'? I'd like to
> figure out what's going on there so as to see whether it is worth to setup
> the PerformanceWarning back.
This is using PyTables 1.4 and numarray, so I am not yet sure how it
will apply to PyTables 2.0 and numpy.
original.h5:
root:
22,714 (2000, 12) arrays of Float64
balanced.h5:
root:
256 groups with approximately the same number of children:
total 22,714 (2000, 12) arrays of Float64
--
Michael Hoffman
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/pytables-users