Re: [Pytables-users] Using uint8 columns with PyTables2.0's numexpr where clauses

Francesc Altet Wed, 21 Mar 2007 10:40:53 -0800

Hi Stephen,

El dc 21 de 03 del 2007 a les 18:41 +1100, en/na Stephen Simmons va
escriure:
> Hi everyone,
> 
> I've just started using the Pytables2.0 beta, and am very impressed, 
> especially with the complex where clauses possible using numexpr.
> 
> I have hit one stumbling block in one of my tables, though: selecting on 
> a uint8 column. This fails with:
>   <type 'exceptions.NotImplementedError'>: variable ``N_ACCT`` refers to 
> a 64-bit unsigned integer column,
>   not yet supported in conditions, sorry; please use regular Python 
> selections
>   WARNING: Failure executing file: <RBPresults.py>
> 
> Maybe this has been/will be fixed in a newer version of numexpr.


Well, I'm not so sure. Numexpr developers are quite conservative in
adding support for more datatypes than strictly necessary because they
are afraid (with good reason) that adding support for more datatypes
could impact negatively the performance. In fact, this has been seen
already in the improved version that comes with PyTables:

http://article.gmane.org/gmane.comp.python.numeric.general/14130/match=numexpr

There is also the problem that Win32 doesn't have an unsigned 64-bit
int, or at least, this was my understanding:

http://www.mail-archive.com/[email protected]/msg00434.html

However, after digging on internet about this, I think that this was
more a limitation of the Microsoft compiler than the Windows itself, and
there are chances that from MSVC 7.0 (aka .NET) on, a __unit64 has been
introduced. As PyTables 2.0 will only work with Python2.4 on, and as it
happens that Python2.4 (and Python2.5) has been compiled with MSVC 7.1,
then it could happen that Windows wouldn't be a problem on this area
anymore. Any Windows savvy guy here can confirm this point?

Still, there is the problem of performance, but the study above was made
with a CPU processor having 64 KB of L2 cache, and nowadays I guess that
128 KB is a far more common figure for still-in-use CPUs. It would be
nice if somebody with access to a machine with 128 KB can reproduce
those benchmarks and see if the computing kernel of the pytables version
of numexpr would accept more datatypes without affecting performance too
much.

In any case, as 2.0 is about to be released *now*, I think that we
should take care about the average of L2 sizes of CPUs that are hitting
the market *now*. So I'd say that 512 KB could be a good target for
PyTables purposes. If somebody complains about having bad performance
with selections, then, he only have to upgrade to a recent CPU with a
decent L2 cache on-die ;)

> If not, maybe PyTables could pretend that uint8 columns are actually 
> 8-character strings using a numpy ndarray.view('S8')? These sort in the 
> same order (if a bit more slowly). Then everything should be OK 
> providing the expressions are simple equality/inequality comparisons.
> 
> Then I could alter my code from:
>         where_str = '((C_PROD=="%s")&(N_ACCT>=%d)&(N_ACCT<=%d))' % 
> (logo, acct_min, acct_max)
> to
>         where_str = '((C_PROD=="%s")&(N_ACCT>="%d")&(N_ACCT<="%d"))' % 
> (logo, acct_min, acct_max) 

Yeah. Smart trick. However, it somehow breaks the semantics of the
numerical expressions, so let's hope that we don't have to follow this
path.

Cheers,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Using uint8 columns with PyTables2.0's numexpr where clauses

Reply via email to