Hi Stephen, El dc 21 de 03 del 2007 a les 18:41 +1100, en/na Stephen Simmons va escriure: > Hi everyone, > > I've just started using the Pytables2.0 beta, and am very impressed, > especially with the complex where clauses possible using numexpr. > > I have hit one stumbling block in one of my tables, though: selecting on > a uint8 column. This fails with: > <type 'exceptions.NotImplementedError'>: variable ``N_ACCT`` refers to > a 64-bit unsigned integer column, > not yet supported in conditions, sorry; please use regular Python > selections > WARNING: Failure executing file: <RBPresults.py> > > Maybe this has been/will be fixed in a newer version of numexpr.
Well, I'm not so sure. Numexpr developers are quite conservative in adding support for more datatypes than strictly necessary because they are afraid (with good reason) that adding support for more datatypes could impact negatively the performance. In fact, this has been seen already in the improved version that comes with PyTables: http://article.gmane.org/gmane.comp.python.numeric.general/14130/match=numexpr There is also the problem that Win32 doesn't have an unsigned 64-bit int, or at least, this was my understanding: http://www.mail-archive.com/[email protected]/msg00434.html However, after digging on internet about this, I think that this was more a limitation of the Microsoft compiler than the Windows itself, and there are chances that from MSVC 7.0 (aka .NET) on, a __unit64 has been introduced. As PyTables 2.0 will only work with Python2.4 on, and as it happens that Python2.4 (and Python2.5) has been compiled with MSVC 7.1, then it could happen that Windows wouldn't be a problem on this area anymore. Any Windows savvy guy here can confirm this point? Still, there is the problem of performance, but the study above was made with a CPU processor having 64 KB of L2 cache, and nowadays I guess that 128 KB is a far more common figure for still-in-use CPUs. It would be nice if somebody with access to a machine with 128 KB can reproduce those benchmarks and see if the computing kernel of the pytables version of numexpr would accept more datatypes without affecting performance too much. In any case, as 2.0 is about to be released *now*, I think that we should take care about the average of L2 sizes of CPUs that are hitting the market *now*. So I'd say that 512 KB could be a good target for PyTables purposes. If somebody complains about having bad performance with selections, then, he only have to upgrade to a recent CPU with a decent L2 cache on-die ;) > If not, maybe PyTables could pretend that uint8 columns are actually > 8-character strings using a numpy ndarray.view('S8')? These sort in the > same order (if a bit more slowly). Then everything should be OK > providing the expressions are simple equality/inequality comparisons. > > Then I could alter my code from: > where_str = '((C_PROD=="%s")&(N_ACCT>=%d)&(N_ACCT<=%d))' % > (logo, acct_min, acct_max) > to > where_str = '((C_PROD=="%s")&(N_ACCT>="%d")&(N_ACCT<="%d"))' % > (logo, acct_min, acct_max) Yeah. Smart trick. However, it somehow breaks the semantics of the numerical expressions, so let's hope that we don't have to follow this path. Cheers, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Pytables-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pytables-users
