Hi Francesc

Thanks for you help to resolve this confusion.

Does your answer imply that even if I would have numpy installed and somehow 
enabled my problem
with string col indexes will not go away?

Regards
Andrei

PS: have you seen this site: http://wilmott.com/categories.cfm?catid=10. People 
here
need something to deal with huge datasets. Some of them familiar with python.
If you would figure out how to cure speed of string indexes buildup (it is 
widely used because almost all traded securities are identified by string Ids, 
not ints) then you are going to be rich 
very soon :)

PPS: this is instead of success story. Hopefully I'll get one sooner or later 
to share.


-----Original Message-----
From: Francesc Altet [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 08, 2006 3:11 AM
To: Smirnov, Andrei
Cc: '[email protected]'
Subject: Re: [Pytables-users] String columns indexing

Hi Andrei,

El dj 07 de 09 del 2006 a les 16:42 -0400, en/na Smirnov, Andrei va
escriure:
> Hello everybody!
> 
> I am using Python 2.4, pytables 1.3.2, numarray-1.5.1, hdf5-1.6.5 on 
> Linux 2.4.
> 
> Am I right about speed of index creation for string columns: it is 
> much-much slower in compare with integer column index creation?

Yes, this is due to slowness in the sorting method for numarray strings:

In [46]:a=numpy.arange(10000, dtype="byte") In [47]:a.tofile('test.bin') In 
[55]:Timer("b=a.sort()", "import numpy;a=numpy.fromfile('test.bin',
dtype='S10')").repeat(3,100)
Out[55]:[0.15041804313659668, 0.10844111442565918, 0.10880494117736816] In 
[56]:Timer("b=a.sort()", "import numarray.strings; 
a=numarray.strings.fromfile('test.bin', itemsize=10)").repeat(3,100) 
Out[56]:[2.4770519733428955, 2.4263198375701904, 2.4242000579833984]

i.e. numarray sorting for strings is 20x slower than numpy strings. Of course, 
when numpy will be at the core of pytables the indexing times will hopefully be 
much better.

> 
> I could replace string column with integer one which contains index 
> for some string table. Is it the best what I can do for the moment?

Well, if what you want is to search strings as keys in a dictionary, you can 
follow a similar strategy by creating a hash (for example with python builtin 
hash()) of the string and feed this value to a Int32 (or Int64, if you are on a 
64-bit platform) column. For integers (and, in general, for anything that is 
not a string), the sorting speed in numarray and numpy are similar:

In [57]:Timer("b=a.sort()", "import numarray; a=numarray.fromfile('test.bin', 
type='Int32')").repeat(3,100) Out[57]:[0.030822992324829102, 
0.03096318244934082, 0.031370878219604492] In [58]:Timer("b=a.sort()", "import 
numpy;a=numpy.fromfile('test.bin',
dtype='int32')").repeat(3,100)
Out[58]:[0.094920158386230469, 0.038717985153198242, 0.038733959197998047]


HTH,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



==============================================================================
Please access the attached hyperlink for an important electronic communications 
disclaimer: 

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==============================================================================


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to