Re: [Numpy-discussion] Back to numexpr
Ivan Vilata i Balaguer wrote: >En/na Tim Hochberg ha escrit:: > > > >>Francesc Altet wrote: >>[...] >> >> >>>Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for >>>some users (specially in 32-bit platforms), is a type with the same rights >>>than the others and we would like to give support for it in numexpr. In >>>fact, >>>Ivan Vilata already has implemented this suport in our local copy of >>>numexpr, >>>so perhaps (I say perhaps because we are in the middle of a big project now >>>and are a bit scarce of time resources) we can provide the patch against the >>>latest version of David for your consideration. With this we can solve the >>>problem with int64 support in 32-bit platforms (although addmittedly, the VM >>>gets a bit more complicated, I really think that this is worth the effort) >>> >>> >>In addition to complexity, I worry that we'll overflow the code cache at >>some point and slow everything down. To be honest I have no idea at what >>point that is likely to happen, but I know they worry about it with the >>Python interpreter mainloop. Also, it becomes much, much slower to >>compile past a certain number of case statements under VC7, not sure >>why. That's mostly my problem though. >>[...] >> >> > >Hi! For your information, the addition of separate, predictably-sized >int (int32) and long (int64) types to numexpr was roughly as complicated >as the addition of boolean types, so maybe the increase of complexity >isn't that important (but I recognise I don't know the effect on the >final size of the VM). > > I didn't expect it to be any worse than booleans (I would imagine it's about the same). It's just that there's a point at which we are going to slow down the VM do to sheer size. I don't know where that point is, so I'm cautious. Booleans seem like they need to be supported directly in the interpreter, while only one each (the largest one) of ints, floats and complexs do. Booleans are different since they have different behaviour than integers, so they need a separate set of opcodes. For floats and complexes, the largest is also the most commonly used, so this works out well. For ints on the other hand, int32 is the most commonly used, but int64 is the largest, so the approach of using the largest is going to result in a speed hit for the most common integer case. Implementing both, as you've done solves that, but as I say, I worry about making the interpreter core too big. I expect that you've timed things before and after the addition of int64 and not gotten a noticable slowdown. That's good, although it doesn't entirely mean we're out of the woods since I expect that more opcodes that we just need to add will show up and at some point I we may run into an opcode crunch. Or maybe I'm just being paranoid. >As soon as I have time (and a SVN version of numexpr which passes the >tests ;) ) I will try to merge back the changes and send a patch to the >list. Thanks for your patience! :) > > I look forward to seeing it. Now if only I can get svn numexpr to stop seqfaulting under windows I'll be able to do something useful... -tim >:: > > Ivan Vilata i Balaguer >qo< http://www.carabos.com/ > Cárabos Coop. V. V V Enjoy Data > "" > > > ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
En/na Tim Hochberg ha escrit:: > Francesc Altet wrote: > [...] >>Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for >>some users (specially in 32-bit platforms), is a type with the same rights >>than the others and we would like to give support for it in numexpr. In fact, >>Ivan Vilata already has implemented this suport in our local copy of numexpr, >>so perhaps (I say perhaps because we are in the middle of a big project now >>and are a bit scarce of time resources) we can provide the patch against the >>latest version of David for your consideration. With this we can solve the >>problem with int64 support in 32-bit platforms (although addmittedly, the VM >>gets a bit more complicated, I really think that this is worth the effort) > > In addition to complexity, I worry that we'll overflow the code cache at > some point and slow everything down. To be honest I have no idea at what > point that is likely to happen, but I know they worry about it with the > Python interpreter mainloop. Also, it becomes much, much slower to > compile past a certain number of case statements under VC7, not sure > why. That's mostly my problem though. > [...] Hi! For your information, the addition of separate, predictably-sized int (int32) and long (int64) types to numexpr was roughly as complicated as the addition of boolean types, so maybe the increase of complexity isn't that important (but I recognise I don't know the effect on the final size of the VM). As soon as I have time (and a SVN version of numexpr which passes the tests ;) ) I will try to merge back the changes and send a patch to the list. Thanks for your patience! :) :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data "" signature.asc Description: OpenPGP digital signature ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
Francesc Altet wrote: >A Dimarts 13 Juny 2006 20:46, Tim Hochberg va escriure: > > >>>Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for >>>some users (specially in 32-bit platforms), is a type with the same rights >>>than the others and we would like to give support for it in numexpr. In >>>fact, Ivan Vilata already has implemented this suport in our local copy >>>of numexpr, so perhaps (I say perhaps because we are in the middle of a >>>big project now and are a bit scarce of time resources) we can provide >>>the patch against the latest version of David for your consideration. >>>With this we can solve the problem with int64 support in 32-bit platforms >>>(although addmittedly, the VM gets a bit more complicated, I really think >>>that this is worth the effort) >>> >>> >>In addition to complexity, I worry that we'll overflow the code cache at >>some point and slow everything down. To be honest I have no idea at what >>point that is likely to happen, but I know they worry about it with the >>Python interpreter mainloop. >> >> > >That's true. I didn't think about this :-/ > > > >>Also, it becomes much, much slower to >>compile past a certain number of case statements under VC7, not sure >>why. That's mostly my problem though. >> >> > >No, this is a general problem (I'd say much more in GCC, because the optimizer >runs so slow). However, this should only affect to poor developers, not >users and besides, we should find a solution for int64 in 32-bit platforms. > > Yeah. This is just me whining. Under VC7, there is a very sudden change when adding more cases where compile times go from seconds to minutes. I think we're already past that now anyway, so slowing that down more isn't going to hurt me. Overflowing the cache is the real thing I worry about. >>One idea that might be worth trying for int64 is to special case them >>using functions. That is using OP_FUNC_LL and OP_FUNC_LLL and some >>casting opcodes. This could support int64 with relatively few new >>opcodes. There's obviously some exta overhead introduced here by the >>function call. How much this matters is probably a function of how well >>the compiler / hardware supports int64 to begin with. >> >> > >Mmm, in my experience int64 operations are reasonable well supported by modern >32-bit processors (IIRC they normally take twice of the time than int32 ops). > >The problem with using a long for representing ints in numexpr is that we have >the duality of being represented differently in 32/64-bit platforms and that >could a headache in the long term (int64 support in 32-bit platforms is only >one issue, but there should be more). IMHO, it is much better to assign the >role for ints in numexpr to a unique datatype, and this should be int64, for >the sake of wide int64 support, but also for future (and present!) 64-bit >processors. The problem would be that operations with 32-bit ints in 32-bit >processors can be slowed-down by a factor 2x (or more, because there is a >casting now), but in exchange, whe have full portable code and int64 support. > > This certainly makes things simpler. I think that this would be fine with me since I mostly use float and complex, so the speed issue wouldn't hit me much. But that's 'cause I'm selfish that way ;-) >In case we consider entering this way, we have two options here: keep VM >simple and advertise that int32 arithmetic in numexpr in 32-bit platforms >will be sub-optimal, or, as we already have done, add the proper machinery to >support both integer separately (at the expense of making the VM more >complex). Or perhaps David can come with a better solution (vmgen from >gforth? no idea what this is, but the name sounds sexy;-) > > Yeah! >>That brings up another point. We probably don't want to have casting >>opcodes from/to everything. Given that there are 8 types on the table >>now, if we support every casting opcode we're going to have 56(?) >>opcodes just for casting. I imagine what we'll have to do is write a >>cast from int16 to float as OP_CAST_Ii; OP_CAST_FI; trading an extra >>step in these cases for keeping the number of casting opcodes under >>control. Once again, int64 is problematic since you lose precision >>casting to int. I guess in this case you could get by with being able to >>cast back and forth to float and int. No need to cast directly to >>booleans, etc as two stage casting should suffice for this. >> >> > >Well, we already thought about this. Not only you can't safely cast an int64 >to an int32 without loosing precistion, but what is worse, you can't even >cast it to any other commonly available datatype (casting to a float64 will >also loose precision). And, although you can afford loosing precision when >dealing with floating data in some scenarios (but not certainly with a >general-purpose library like numexpr tries to be), it is by any means >unacceptable loosing 'precision' in ints. So, to
Re: [Numpy-discussion] Back to numexpr
On Tue, 13 Jun 2006 21:30:41 +0200 Francesc Altet <[EMAIL PROTECTED]> wrote: > A Dimarts 13 Juny 2006 20:46, Tim Hochberg va escriure: > > >Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre > > >for some users (specially in 32-bit platforms), is a type with the same > > >rights than the others and we would like to give support for it in > > >numexpr. In > > > fact, Ivan Vilata already has implemented this suport in our local copy > > > of numexpr, so perhaps (I say perhaps because we are in the middle of a > > > big project now and are a bit scarce of time resources) we can provide > > > the patch against the latest version of David for your consideration. > > > With this we can solve the problem with int64 support in 32-bit > > > platforms (although addmittedly, the VM gets a bit more complicated, I > > > really think that this is worth the effort) > > > > In addition to complexity, I worry that we'll overflow the code cache at > > some point and slow everything down. To be honest I have no idea at what > > point that is likely to happen, but I know they worry about it with the > > Python interpreter mainloop. > > That's true. I didn't think about this :-/ > > > Also, it becomes much, much slower to > > compile past a certain number of case statements under VC7, not sure > > why. That's mostly my problem though. > > No, this is a general problem (I'd say much more in GCC, because the > optimizer runs so slow). However, this should only affect to poor > developers, not users and besides, we should find a solution for int64 in > 32-bit platforms. If I switch to vmgen, it can easily make two versions of the code: one using a case statement, and another direct-threaded version for GCC (which supports taking the address of a label, and doing a 'goto' to a variable). Won't solve the I-cache problem, though. And there's always subroutine threading (each opcode is a function, and the program is a list of function pointers). We won't know until we try :) > Or perhaps > David can come with a better solution (vmgen from gforth? no idea what this > is, but the name sounds sexy;-) The docs for it are at http://www.complang.tuwien.ac.at/anton/vmgen/html-docs/ -- |>|\/|< /--\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |[EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
A Dimarts 13 Juny 2006 20:46, Tim Hochberg va escriure: > >Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for > >some users (specially in 32-bit platforms), is a type with the same rights > >than the others and we would like to give support for it in numexpr. In > > fact, Ivan Vilata already has implemented this suport in our local copy > > of numexpr, so perhaps (I say perhaps because we are in the middle of a > > big project now and are a bit scarce of time resources) we can provide > > the patch against the latest version of David for your consideration. > > With this we can solve the problem with int64 support in 32-bit platforms > > (although addmittedly, the VM gets a bit more complicated, I really think > > that this is worth the effort) > > In addition to complexity, I worry that we'll overflow the code cache at > some point and slow everything down. To be honest I have no idea at what > point that is likely to happen, but I know they worry about it with the > Python interpreter mainloop. That's true. I didn't think about this :-/ > Also, it becomes much, much slower to > compile past a certain number of case statements under VC7, not sure > why. That's mostly my problem though. No, this is a general problem (I'd say much more in GCC, because the optimizer runs so slow). However, this should only affect to poor developers, not users and besides, we should find a solution for int64 in 32-bit platforms. > One idea that might be worth trying for int64 is to special case them > using functions. That is using OP_FUNC_LL and OP_FUNC_LLL and some > casting opcodes. This could support int64 with relatively few new > opcodes. There's obviously some exta overhead introduced here by the > function call. How much this matters is probably a function of how well > the compiler / hardware supports int64 to begin with. Mmm, in my experience int64 operations are reasonable well supported by modern 32-bit processors (IIRC they normally take twice of the time than int32 ops). The problem with using a long for representing ints in numexpr is that we have the duality of being represented differently in 32/64-bit platforms and that could a headache in the long term (int64 support in 32-bit platforms is only one issue, but there should be more). IMHO, it is much better to assign the role for ints in numexpr to a unique datatype, and this should be int64, for the sake of wide int64 support, but also for future (and present!) 64-bit processors. The problem would be that operations with 32-bit ints in 32-bit processors can be slowed-down by a factor 2x (or more, because there is a casting now), but in exchange, whe have full portable code and int64 support. In case we consider entering this way, we have two options here: keep VM simple and advertise that int32 arithmetic in numexpr in 32-bit platforms will be sub-optimal, or, as we already have done, add the proper machinery to support both integer separately (at the expense of making the VM more complex). Or perhaps David can come with a better solution (vmgen from gforth? no idea what this is, but the name sounds sexy;-) > > That brings up another point. We probably don't want to have casting > opcodes from/to everything. Given that there are 8 types on the table > now, if we support every casting opcode we're going to have 56(?) > opcodes just for casting. I imagine what we'll have to do is write a > cast from int16 to float as OP_CAST_Ii; OP_CAST_FI; trading an extra > step in these cases for keeping the number of casting opcodes under > control. Once again, int64 is problematic since you lose precision > casting to int. I guess in this case you could get by with being able to > cast back and forth to float and int. No need to cast directly to > booleans, etc as two stage casting should suffice for this. Well, we already thought about this. Not only you can't safely cast an int64 to an int32 without loosing precistion, but what is worse, you can't even cast it to any other commonly available datatype (casting to a float64 will also loose precision). And, although you can afford loosing precision when dealing with floating data in some scenarios (but not certainly with a general-purpose library like numexpr tries to be), it is by any means unacceptable loosing 'precision' in ints. So, to my mind, the only solution is completely avoiding casting int64 to any type. Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
Francesc Altet wrote: >Ei, numexpr seems to be back, wow! :-D > >A Dimarts 13 Juny 2006 18:56, Tim Hochberg va escriure: > > >>I've finally got around to looking at numexpr again. Specifically, I'm >>looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing >>the two versions. Let me go through his list of enhancements and comment >>(my comments are dedented): >> >> > >Well, as David already said, he committed most of my additions some days >ago :-) > > > >>- Enhanced performance for strided and unaligned data, specially for >>lightweigth computations (e.g. 'a>10'). With this and the addition of >>the boolean type, we can get up to 2x better times than previous >>versions. Also, most of the supported computations goes faster than >>with numpy or numarray, even the simplest one. >> >>Francesc, if you're out there, can you briefly describe what this >>support consists of? It's been long enough since I was messing with this >>that it's going to take me a while to untangle NumExpr_run, where I >>expect it's lurking, so any hints would be appreciated. >> >> > >This is easy. When dealing with strided or unaligned vectors, instead of >copying them completely to well-behaved arrays, they are copied only when the >virtual machine needs the appropriate blocks. With this, there is no need to >write the well-behaved array back into main memory, which can bring an >important bottleneck, specially when dealing with large arrays. This allows a >better use of the processor caches because data is catched and used only when >the VM needs it. Also, I see that David has added support for byteswapped >arrays, which is great! > > I'm looking at this now. I imagine it will become clear eventually. I've clearly forgotten some stuff over the last few months. Sigh. First I need to get it to compile here. It seems that a few GCCisms have crept back in. [SNIP] >>rarely used. >> >> > >Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for >some users (specially in 32-bit platforms), is a type with the same rights >than the others and we would like to give support for it in numexpr. In fact, >Ivan Vilata already has implemented this suport in our local copy of numexpr, >so perhaps (I say perhaps because we are in the middle of a big project now >and are a bit scarce of time resources) we can provide the patch against the >latest version of David for your consideration. With this we can solve the >problem with int64 support in 32-bit platforms (although addmittedly, the VM >gets a bit more complicated, I really think that this is worth the effort) > > In addition to complexity, I worry that we'll overflow the code cache at some point and slow everything down. To be honest I have no idea at what point that is likely to happen, but I know they worry about it with the Python interpreter mainloop. Also, it becomes much, much slower to compile past a certain number of case statements under VC7, not sure why. That's mostly my problem though. One idea that might be worth trying for int64 is to special case them using functions. That is using OP_FUNC_LL and OP_FUNC_LLL and some casting opcodes. This could support int64 with relatively few new opcodes. There's obviously some exta overhead introduced here by the function call. How much this matters is probably a function of how well the compiler / hardware supports int64 to begin with. That brings up another point. We probably don't want to have casting opcodes from/to everything. Given that there are 8 types on the table now, if we support every casting opcode we're going to have 56(?) opcodes just for casting. I imagine what we'll have to do is write a cast from int16 to float as OP_CAST_Ii; OP_CAST_FI; trading an extra step in these cases for keeping the number of casting opcodes under control. Once again, int64 is problematic since you lose precision casting to int. I guess in this case you could get by with being able to cast back and forth to float and int. No need to cast directly to booleans, etc as two stage casting should suffice for this. -tim ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
A Dimarts 13 Juny 2006 19:47, Francesc Altet va escriure: > > - Support for both numpy and numarray (use the flag --force-numarray > > in setup.py). > > > > At first glance this looks like it doesn't make things to messy, so I'm > > in favor of incorporating this. > > Yeah. I thing you are right. It's only that we need this for our own things > :) Ooops! small correction here. I thought that you were saying that you were *not* in favour of supporting numarray as well, but you clearly was. Sorry about the misunderstanding. Anyway, if David's idea of providing a thin numpy-compatible numarray layer is easy to implement, then great. Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
Ei, numexpr seems to be back, wow! :-D A Dimarts 13 Juny 2006 18:56, Tim Hochberg va escriure: > I've finally got around to looking at numexpr again. Specifically, I'm > looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing > the two versions. Let me go through his list of enhancements and comment > (my comments are dedented): Well, as David already said, he committed most of my additions some days ago :-) > - Enhanced performance for strided and unaligned data, specially for > lightweigth computations (e.g. 'a>10'). With this and the addition of > the boolean type, we can get up to 2x better times than previous > versions. Also, most of the supported computations goes faster than > with numpy or numarray, even the simplest one. > > Francesc, if you're out there, can you briefly describe what this > support consists of? It's been long enough since I was messing with this > that it's going to take me a while to untangle NumExpr_run, where I > expect it's lurking, so any hints would be appreciated. This is easy. When dealing with strided or unaligned vectors, instead of copying them completely to well-behaved arrays, they are copied only when the virtual machine needs the appropriate blocks. With this, there is no need to write the well-behaved array back into main memory, which can bring an important bottleneck, specially when dealing with large arrays. This allows a better use of the processor caches because data is catched and used only when the VM needs it. Also, I see that David has added support for byteswapped arrays, which is great! > - Support for both numpy and numarray (use the flag --force-numarray > in setup.py). > > At first glance this looks like it doesn't make things to messy, so I'm > in favor of incorporating this. Yeah. I thing you are right. It's only that we need this for our own things :) > - Add types for int16, int64 (in 32-bit platforms), float32, > complex64 (simple prec.) > > I have some specific ideas about how this should be accomplished. > Basically, I don't think we want to support every type in the same way, > since this is going to make the case statement blow up to an enormous > size. This may slow things down and at a minimum it will make things > less comprehensible. My thinking is that we only add casts for the extra > types and do the computations at high precision. Thus adding two int16 > numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then > a OP_CAST_fF. The details are left as an excercise to the reader ;-). > So, adding int16, float32, complex64 should only require the addition of > 6 casting opcodes plus appropriate modifications to the compiler. > > For large arrays, this should have most of the benfits of giving each > type it's own opcode, since the memory bandwidth is still small, while > keeping the interpreter relatively simple. Yes, I like the idea as well. > Unfortunately, int64 doesn't fit under this scheme; is it used enough to > matter? I hate pile a whole pile of new opcodes on for something that's > rarely used. Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for some users (specially in 32-bit platforms), is a type with the same rights than the others and we would like to give support for it in numexpr. In fact, Ivan Vilata already has implemented this suport in our local copy of numexpr, so perhaps (I say perhaps because we are in the middle of a big project now and are a bit scarce of time resources) we can provide the patch against the latest version of David for your consideration. With this we can solve the problem with int64 support in 32-bit platforms (although addmittedly, the VM gets a bit more complicated, I really think that this is worth the effort). Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
David M. Cooke wrote: >On Tue, Jun 13, 2006 at 09:56:37AM -0700, Tim Hochberg wrote: > > >>[SNIP] >> >> > >All the above is checked in already :-) > > So I see. Oops! > > >> - Support for both numpy and numarray (use the flag --force-numarray >> in setup.py). >> >>At first glance this looks like it doesn't make things to messy, so I'm >>in favor of incorporating this. >> >> > >... although I had ripped this all out. I'd rather have a numpy-compatible >numarray layer (at the C level, this means defining macros like PyArray_DATA) >than different code for each. > > Okey dokey. I don't feel strongly about this either way other than I'd rather have one version of numexpr around rather than two almost identical versions. Whatever makes that work would makes me happy. > > >> - Added a new benchmark for testing boolean expressions and >> strided/unaligned arrays: boolean_timing.py >> >>Benchmarks are always good. >> >> > >Haven't checked that in yet. > > > >> Things that I want to address in the future: >> >> - Add tests on strided and unaligned data (currently only tested >> manually) >> >>Yep! Tests are good. >> >> - Add types for int16, int64 (in 32-bit platforms), float32, >> complex64 (simple prec.) >> >>I have some specific ideas about how this should be accomplished. >>Basically, I don't think we want to support every type in the same way, >>since this is going to make the case statement blow up to an enormous >>size. This may slow things down and at a minimum it will make things >>less comprehensible. >> >> > >I've been thinking how to generate the virtual machine programmatically, >specifically I've been looking at vmgen from gforth again. I've got other >half-formed ideas too (separate scalar machine for reductions?) that I'm >working on too. > >But yes, the # of types does make things harder to redo :-) > > > >>My thinking is that we only add casts for the extra >>types and do the computations at high precision. Thus adding two int16 >>numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then >>a OP_CAST_fF. The details are left as an excercise to the reader ;-). >>So, adding int16, float32, complex64 should only require the addition of >>6 casting opcodes plus appropriate modifications to the compiler. >> >> > >My thinking too. > > Yeah! Although I'm not in a hurry on this part. I'm remembering now that the next item on my agenda was to work on supporting broadcasting. I don't exactly know how this is going to work, although I recall having something of a plan at some point. Perhaps the easiest way to start out is to just test the shapes of the input array for compatibility. If they're compatible and don't require broadcasting, proceed as now. If they are incompatible, raise a "ValueError: shape mismatch: objects cannot be broadcast to a single shape" as numpy does. If they are compatible, but require broadcasting, raise a NotImplementedError. This should be relatively easy and makes the numexpr considerably more congruent with numpy. I'm hoping that, while working on that, my plan will pop back into my head ;-) [SNIP] Regards, -tim ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
On Tue, Jun 13, 2006 at 09:56:37AM -0700, Tim Hochberg wrote: > > I've finally got around to looking at numexpr again. Specifically, I'm > looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing > the two versions. Let me go through his list of enhancements and comment > (my comments are dedented): > >- Addition of a boolean type. This allows better array copying times >for large arrays (lightweight computations ara typically bounded by >memory bandwidth). > > Adding this to numexpr looks like a no brainer. Behaviour of booleans > are different than integers, so in addition to being more memory > efficient, this enables boolean &, |, ~, etc to work properly. > >- Enhanced performance for strided and unaligned data, specially for >lightweigth computations (e.g. 'a>10'). With this and the addition of >the boolean type, we can get up to 2x better times than previous >versions. Also, most of the supported computations goes faster than >with numpy or numarray, even the simplest one. > > Francesc, if you're out there, can you briefly describe what this > support consists of? It's been long enough since I was messing with this > that it's going to take me a while to untangle NumExpr_run, where I > expect it's lurking, so any hints would be appreciated. > >- Addition of ~, & and | operators (a la numarray.where) > > Sounds good. All the above is checked in already :-) >- Support for both numpy and numarray (use the flag --force-numarray >in setup.py). > > At first glance this looks like it doesn't make things to messy, so I'm > in favor of incorporating this. ... although I had ripped this all out. I'd rather have a numpy-compatible numarray layer (at the C level, this means defining macros like PyArray_DATA) than different code for each. >- Added a new benchmark for testing boolean expressions and >strided/unaligned arrays: boolean_timing.py > > Benchmarks are always good. Haven't checked that in yet. > >Things that I want to address in the future: > >- Add tests on strided and unaligned data (currently only tested >manually) > > Yep! Tests are good. > >- Add types for int16, int64 (in 32-bit platforms), float32, > complex64 (simple prec.) > > I have some specific ideas about how this should be accomplished. > Basically, I don't think we want to support every type in the same way, > since this is going to make the case statement blow up to an enormous > size. This may slow things down and at a minimum it will make things > less comprehensible. I've been thinking how to generate the virtual machine programmatically, specifically I've been looking at vmgen from gforth again. I've got other half-formed ideas too (separate scalar machine for reductions?) that I'm working on too. But yes, the # of types does make things harder to redo :-) > My thinking is that we only add casts for the extra > types and do the computations at high precision. Thus adding two int16 > numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then > a OP_CAST_fF. The details are left as an excercise to the reader ;-). > So, adding int16, float32, complex64 should only require the addition of > 6 casting opcodes plus appropriate modifications to the compiler. My thinking too. > For large arrays, this should have most of the benfits of giving each > type it's own opcode, since the memory bandwidth is still small, while > keeping the interpreter relatively simple. > > Unfortunately, int64 doesn't fit under this scheme; is it used enough to > matter? I hate pile a whole pile of new opcodes on for something that's > rarely used. -- |>|\/|< /--\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |[EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Back to numexpr
Oops! Having just done an svn update, I now see that David appears to have done most of this about a week ago... I'm behind the times. -tim Tim Hochberg wrote: >I've finally got around to looking at numexpr again. Specifically, I'm >looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing >the two versions. Let me go through his list of enhancements and comment >(my comments are dedented): > >- Addition of a boolean type. This allows better array copying times >for large arrays (lightweight computations ara typically bounded by >memory bandwidth). > >Adding this to numexpr looks like a no brainer. Behaviour of booleans >are different than integers, so in addition to being more memory >efficient, this enables boolean &, |, ~, etc to work properly. > >- Enhanced performance for strided and unaligned data, specially for >lightweigth computations (e.g. 'a>10'). With this and the addition of >the boolean type, we can get up to 2x better times than previous >versions. Also, most of the supported computations goes faster than >with numpy or numarray, even the simplest one. > >Francesc, if you're out there, can you briefly describe what this >support consists of? It's been long enough since I was messing with this >that it's going to take me a while to untangle NumExpr_run, where I >expect it's lurking, so any hints would be appreciated. > >- Addition of ~, & and | operators (a la numarray.where) > >Sounds good. > >- Support for both numpy and numarray (use the flag --force-numarray >in setup.py). > >At first glance this looks like it doesn't make things to messy, so I'm >in favor of incorporating this. > >- Added a new benchmark for testing boolean expressions and >strided/unaligned arrays: boolean_timing.py > >Benchmarks are always good. > >Things that I want to address in the future: > >- Add tests on strided and unaligned data (currently only tested >manually) > >Yep! Tests are good. > >- Add types for int16, int64 (in 32-bit platforms), float32, > complex64 (simple prec.) > >I have some specific ideas about how this should be accomplished. >Basically, I don't think we want to support every type in the same way, >since this is going to make the case statement blow up to an enormous >size. This may slow things down and at a minimum it will make things >less comprehensible. My thinking is that we only add casts for the extra >types and do the computations at high precision. Thus adding two int16 >numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then >a OP_CAST_fF. The details are left as an excercise to the reader ;-). >So, adding int16, float32, complex64 should only require the addition of >6 casting opcodes plus appropriate modifications to the compiler. > >For large arrays, this should have most of the benfits of giving each >type it's own opcode, since the memory bandwidth is still small, while >keeping the interpreter relatively simple. > >Unfortunately, int64 doesn't fit under this scheme; is it used enough to >matter? I hate pile a whole pile of new opcodes on for something that's >rarely used. > > >Regards, > >-tim > > > > > >___ >Numpy-discussion mailing list >Numpy-discussion@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
[Numpy-discussion] Back to numexpr
I've finally got around to looking at numexpr again. Specifically, I'm looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing the two versions. Let me go through his list of enhancements and comment (my comments are dedented): - Addition of a boolean type. This allows better array copying times for large arrays (lightweight computations ara typically bounded by memory bandwidth). Adding this to numexpr looks like a no brainer. Behaviour of booleans are different than integers, so in addition to being more memory efficient, this enables boolean &, |, ~, etc to work properly. - Enhanced performance for strided and unaligned data, specially for lightweigth computations (e.g. 'a>10'). With this and the addition of the boolean type, we can get up to 2x better times than previous versions. Also, most of the supported computations goes faster than with numpy or numarray, even the simplest one. Francesc, if you're out there, can you briefly describe what this support consists of? It's been long enough since I was messing with this that it's going to take me a while to untangle NumExpr_run, where I expect it's lurking, so any hints would be appreciated. - Addition of ~, & and | operators (a la numarray.where) Sounds good. - Support for both numpy and numarray (use the flag --force-numarray in setup.py). At first glance this looks like it doesn't make things to messy, so I'm in favor of incorporating this. - Added a new benchmark for testing boolean expressions and strided/unaligned arrays: boolean_timing.py Benchmarks are always good. Things that I want to address in the future: - Add tests on strided and unaligned data (currently only tested manually) Yep! Tests are good. - Add types for int16, int64 (in 32-bit platforms), float32, complex64 (simple prec.) I have some specific ideas about how this should be accomplished. Basically, I don't think we want to support every type in the same way, since this is going to make the case statement blow up to an enormous size. This may slow things down and at a minimum it will make things less comprehensible. My thinking is that we only add casts for the extra types and do the computations at high precision. Thus adding two int16 numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then a OP_CAST_fF. The details are left as an excercise to the reader ;-). So, adding int16, float32, complex64 should only require the addition of 6 casting opcodes plus appropriate modifications to the compiler. For large arrays, this should have most of the benfits of giving each type it's own opcode, since the memory bandwidth is still small, while keeping the interpreter relatively simple. Unfortunately, int64 doesn't fit under this scheme; is it used enough to matter? I hate pile a whole pile of new opcodes on for something that's rarely used. Regards, -tim ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion