Re: [pygame] Python and Speed
The way I speed up my python code is Ctypes. I just make a dll file in C or asm and then call it with Ctypes and presto. I have tons of speed at my fingertips. Just my 2 cents :)
Re: [pygame] Python and Speed
On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED] wrote: René Dudfield wrote: 2. - asm optimizations. There seems to be almost no asm optimizations in CPython. That's a deliberate policy. One of the goals of CPython is to be very portable and written in a very straightforward way. Including special pieces of asm for particular architectures isn't usually considered worth the maintenance effort required. Other, more portable, software has proved this to be somewhat wrong I think. Optional asm software is used in a lot of software today to good effect. Also this decision was made a while ago, and things have changed since then I think. - python now has unittests. So testing that the asm code works, and keeps working correctly is much easier. - x86 is now very common. Most mainstream server, and desktops use x86. So just targeting x86 gives you a lot more benefit now. - SIMD instructions are the fast ones... so you don't actually have to learn all that much to write fast asm - you only have to learn a subset of asm. You can get compilers to generate the first pass of the function, and then modify it. Of course writing the fastest possible asm still requires effort - but it is fairly easy for a novice to beat a compiler with SIMD code. - advanced compilers can generate asm, which can then be used by worse compilers. eg, the intel compiler, or vectorc compiler can be used to generate asm, and then be included into C code compiled by gcc. - libraries of fast, tested asm code are available. eg, from amd, intel and others. - python, and FOSS now has a much larger development community with more asm experts. CPython could use faster threading primitives, and more selective releasing of the GIL. Everyone would love to get rid of the GIL as well, but that's another Very Hard Problem about which there has been much discussion, but little in the way of workable ideas. Yeah, not getting rid of the GIL entirely - but selectively releasing it. As an example, pygame releases the GIL around certain C functionality like pygame.transform.scale. Freebsd, and linux have also followed this method - adding more fine grained locking where it is worth it - and improving their threading primitives. I think there has been work already in fixing a lot of python threading issues in the last year - but there's lots more to do. I'm using python on 8 core machines for my work loads just fine today. A way to know how much memory is being used. Memory profiling is the most important way to optimize since memory is quite slow compared to the speed of the cpu. Yes, but amount of memory used doesn't necessarily have anything to do with rate of memory accesses. Locality of reference, so that things stay in the cache, is more important. If you are using 200 bytes for each int, then you can quickly process 50x less data than an int that takes up 4 bytes. If you have 1 gig of available memory, and say kjDict uses up half the memory as a normal dict, a normal dict would use up 2gigs, and your kjDict will use up 1gig. In this case the kjDict would be massively faster than a normal dict because of swapping. I think memory is one of the most important areas in optimising a program these days. So python should provide tools to help measure memory use(how much memory things use, and how things are allocating memory). perhaps releasing a patch with a few selected asm optimizations might let the python developers realise how much faster python could be... Have you actually tried any of this? Measurement would be needed to tell whether these things address any of the actual bottlenecks in CPython. You can try it easily yourself - compile python with machine specific optimisations(eg add -mtune=athlon to your gcc arguments). You can run this python binary, and get faster benchmarks. This provides the proof that more optimised assembly can run faster. Also the link I gave to a commonly used memcpy function running 5x faster should provide you with another proof of the possibilities. Other software being sped up by asm optimisation provides another proof(including SDL, pygame, linux etc). The Pawn language's virtual machine written in nasm is lots faster than the version written in C - which provides another proof. Psyco is another proof that asm can speed up python(psyco is a run time assembler). The idea is you only optimise key stable functions in asm - not everything. For example in SDL the blit functions are written in asm - with C implementations. It's using the best tool for the job: Python for the highest level, then C then asm. I think a patch for CPython would need to be made with benchmarks as a proper proof though - but hopefully the list above provides theoretical proof to you that adding asm optimisations would speed up CPython. However the recompilation with cpu specific compiler flags would only need: - cpu
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
On, Wed Apr 16, 2008, Lenard Lindstrom wrote: I was trying out BLEND_ADD with blit to see if an alpha plane could be added to a surface with alpha's zero. Here is the trial session: [...] Both surfaces contain alpha, so only alpha blending will be applied. The BLEND_* args only work for cases where alpha blending cannot be applied. Should we change this to take alpha into account for the BLEND_* args as well and only apply the default alpha blending, if no BLEND_* args are passed? Regards Marcus pgp1XeRMkhqgb.pgp Description: PGP signature
Re: Re: [pygame] Python and Speed
René Dudfield [EMAIL PROTECTED] wrote: On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED] wrote: René Dudfield wrote: 2. - asm optimizations. There seems to be almost no asm optimizations in CPython. That's a deliberate policy. One of the goals of CPython is to be very portable and written in a very straightforward way. Including special pieces of asm for particular architectures isn't usually considered worth the maintenance effort required. Other, more portable, software has proved this to be somewhat wrong I think. I think this is the wrong forum to be having this discussion :) Richard
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
On Wed, Apr 16, 2008 at 11:48 PM, Marcus von Appen [EMAIL PROTECTED] wrote: Both surfaces contain alpha, so only alpha blending will be applied. The BLEND_* args only work for cases where alpha blending cannot be applied. Should we change this to take alpha into account for the BLEND_* args as well and only apply the default alpha blending, if no BLEND_* args are passed? I think if somebody passes the BLEND args, their intent is pretty clear that they want you to respect the blend args passed, regardless of whether or not the surface has alpha or not. If you pass BLEND args and they don't change the operation at all, that seems like useless behavior for the api. ... that being said, I absolutely think the end result of Lenard's test is right - specifically if I do additive blend with a source that has alpha of zero, the image should be unchanged, because RGBA is a color+transparency mode in other contexts (i.e. an alpha of 0 means do nothing). In terms of art production, alpha'ing out pixels is a great way to make art that is a particular color but only lightly applied. Also, it can be useful to render the same art with both multiplicative and additive blending to get a glowy effect that doesn't saturate over white.
Re: Re: [pygame] Python and Speed
Hi! No, this is the place to discuss it because if we wish to make games, work with existing platforms, and want speed, that is the way to go. Now that we have had this discussion, and found solutions, now we have a list of ways to resolve it. This is the place to discuss all of this and brings to the front the issues of speed, connections, and over all solutions. The only way to make Pygame better, faster and competitive with the world... Just like the question I had with the tts, text to speech, even though I do not use the video end yet, I do use the sound end. So Ian and speed is a very good question and take a look at what came of it below. I learn by doing, examples help because I get to use, tweak, and eventually like many have done, come up with a better solution or firm conclusion. For I now understand adding options into my setup.py file or now I call it setup4tts.py or anything for any need... Bruce From: Richard Jones I think this is the wrong forum to be having this discussion :) Richard From: Jason Ward The way I speed up my python code is Ctypes. I just make a dll file in C or asm and then call it with Ctypes and presto. I have tons of speed at my fingertips. Just my 2 cents :) On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED] wrote: René Dudfield wrote: 2. - asm optimizations. There seems to be almost no asm optimizations in CPython. That's a deliberate policy. One of the goals of CPython is to be very portable and written in a very straightforward way. Including special pieces of asm for particular architectures isn't usually considered worth the maintenance effort required. Other, more portable, software has proved this to be somewhat wrong I think. Optional asm software is used in a lot of software today to good effect. Also this decision was made a while ago, and things have changed since then I think. - python now has unittests. So testing that the asm code works, and keeps working correctly is much easier. - x86 is now very common. Most mainstream server, and desktops use x86. So just targeting x86 gives you a lot more benefit now. - SIMD instructions are the fast ones... so you don't actually have to learn all that much to write fast asm - you only have to learn a subset of asm. You can get compilers to generate the first pass of the function, and then modify it. Of course writing the fastest possible asm still requires effort - but it is fairly easy for a novice to beat a compiler with SIMD code. - advanced compilers can generate asm, which can then be used by worse compilers. eg, the intel compiler, or vectorc compiler can be used to generate asm, and then be included into C code compiled by gcc. - libraries of fast, tested asm code are available. eg, from amd, intel and others. - python, and FOSS now has a much larger development community with more asm experts. CPython could use faster threading primitives, and more selective releasing of the GIL. Everyone would love to get rid of the GIL as well, but that's another Very Hard Problem about which there has been much discussion, but little in the way of workable ideas. Yeah, not getting rid of the GIL entirely - but selectively releasing it. As an example, pygame releases the GIL around certain C functionality like pygame.transform.scale. Freebsd, and linux have also followed this method - adding more fine grained locking where it is worth it - and improving their threading primitives. I think there has been work already in fixing a lot of python threading issues in the last year - but there's lots more to do. I'm using python on 8 core machines for my work loads just fine today. A way to know how much memory is being used. Memory profiling is the most important way to optimize since memory is quite slow compared to the speed of the cpu. Yes, but amount of memory used doesn't necessarily have anything to do with rate of memory accesses. Locality of reference, so that things stay in the cache, is more important. If you are using 200 bytes for each int, then you can quickly process 50x less data than an int that takes up 4 bytes. If you have 1 gig of available memory, and say kjDict uses up half the memory as a normal dict, a normal dict would use up 2gigs, and your kjDict will use up 1gig. In this case the kjDict would be massively faster than a normal dict because of swapping. I think memory is one of the most important areas in optimising a program these days. So python should provide tools to help measure memory use(how much memory things use, and how things are allocating memory). perhaps releasing a patch with a few selected asm optimizations might let the python developers realise how much faster python could be... Have you actually tried any of this? Measurement would be needed to tell whether these things address any of the actual bottlenecks in CPython. You
Re: Re: [pygame] Python and Speed
On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED] wrote: René Dudfield wrote: 2. - asm optimizations. There seems to be almost no asm optimizations in CPython. That's a deliberate policy. One of the goals of CPython is to be very portable and written in a very straightforward way. Including special pieces of asm for particular architectures isn't usually considered worth the maintenance effort required. Other, more portable, software has proved this to be somewhat wrong I think. Optional asm software is used in a lot of software today to good effect. Also this decision was made a while ago, and things have changed since then I think. - python now has unittests. So testing that the asm code works, and keeps working correctly is much easier. - x86 is now very common. Most mainstream server, and desktops use x86. So just targeting x86 gives you a lot more benefit now. - SIMD instructions are the fast ones... so you don't actually have to learn all that much to write fast asm - you only have to learn a subset of asm. You can get compilers to generate the first pass of the function, and then modify it. Of course writing the fastest possible asm still requires effort - but it is fairly easy for a novice to beat a compiler with SIMD code. - advanced compilers can generate asm, which can then be used by worse compilers. eg, the intel compiler, or vectorc compiler can be used to generate asm, and then be included into C code compiled by gcc. - libraries of fast, tested asm code are available. eg, from amd, intel and others. - python, and FOSS now has a much larger development community with more asm experts. I see a slight problem with these architecture specific optimisations, once you release your game out onto the general public, even if you were using super optimized awesomeness and everything ran fine, the majority of people will be running whatever version of the library came with there OS rather compiling there own and it might be somewhat lacking.
Re: [pygame] [BUG] pixelarray_test.py reports errors then hangs for Python 2.4.4
On, Wed Apr 16, 2008, Lenard Lindstrom wrote: Marcus von Appen wrote: On, Wed Apr 16, 2008, Rene Dudfield wrote: hey, you can make a unittest, and then commit to svn. If you tell me what to test for ;-). The bug sounds more like something happening elsewhere mashing up some memory adresses or so and causing later tests to fail. Regards Marcus Sorry, this BUG is a false alarm. When I recompiled Pygame SVN I didn't install it for Python 2.4. So the Pygame 1.8.1pre tests were run with Pygame 1.8.0release. The tests pass for Pygame 1.8.1pre and Python 2.4. Not necessarily. Though the crash might have been caused by it as well, I discovered another bug in the subscript implementation, which might try to write on memory outside of the pixel memory. Fixed in rev. 1220. Regards Marcus pgpymEz5qbHie.pgp Description: PGP signature
Re: Re: [pygame] Python and Speed
I see a slight problem with these architecture specific optimisations, once you release your game out onto the general public, even if you were using super optimized awesomeness and everything ran fine, the majority of people will be running whatever version of the library came with there OS rather compiling there own and it might be somewhat lacking. That's a good point; so everyone should run Gentoo! ;) As for memory management, whatever became of PyMalloc? I haven't heard much about it in a good while. -- This, from Jach.
Re: [pygame] Python and Speed
On Apr 16, 2008, at 6:36 PM, Ian Mallett wrote: Recently, I've been having issues in all fortes of my programming experience where Python is no longer fast enough to fill the need adaquately. I really love Python and its syntax (or lack of), and all the nice modules (such as PyGame) made for it. Are there any plans to improve Python's speed to at least the level of C languages? I feel like Python is living down to its namesake's pace... Ian Python is slow, it is an interpreted language geared toward rapid application development at the expense of execution speed. It is also designed to be highly portable, also at the potential expense of execution speed. That said, much effort is put into making python perform well, and it is certainly possible to make extremely fast python programs when an algorithm is available that allows you to make clever use of data structures to get the job done more efficiency. And optimization is as much about data structures as it is about code (maybe more). Python comes with some tools (cProfile in particular) that allows you to figure out where your code is spending its time so you can speed it up. Note that by removing inefficient code, or choosing a better algorithm, it is possible to get massive speedups in Python and any language. Sometimes (especially in things like graphics and simulations) there is no magic algorithm to make things more efficient. Many times the best you can do is O(N), O(N^2) or even O(2^N). In these cases, Python runs out of steam real fast (in fact any language runs out of steam fast with O(2^N) algorithms). This is especially pronounced in games where you have to perform operations over large arrays once per frame. This is typical in 3D graphics, physics and particle systems. What you must often do in such cases is move the inner loop of these operations into native code. The easiest way to do this is to find a native library that already performs the operations you need. This could be a generic library (like numpy) or something more specific (like pygame or ode). This is especially easy if the library already has a Python binding, if not you can make one using ctypes, pyrex or C directly. The downside here is the additional dependancies. If nothing exists that does what you want, you'll need to write some native code yourself. First identify the slow code. Make sure there is no way to make it fast enough within Python (is there a better algorithm? etc). Then move this code into an extension, written however you prefer. Note it is fairly easy to create an extension module that defines some functions in C that can be imported into Python (especially if they just do number crunching), the most complex aspect is the memory management. Tools like pyrex can automate that for you, but don't give you as close control over the code. But before you do any of this, do some profiling (run your game under 'python -m cProfile') and see what's taking up the time. How much time per frame do you have for game logic? Is your logic too slow or is the drawing too slow? Honestly, my biggest problem with pygame is not python being too slow, but the fill rate of SDL being too slow. I can solve the former with optimization and extensions, but the latter requires much more compromise (lower res, smaller sprites, etc). The hard reality is that the CPU is often too slow for graphics work no matter what language you use (even highly tuned SIMD asm code). That's why we have dedicated graphics hardware designed for the task. Right now SDL doesn't really take advantage of that, and that's a big limitation. -Casey
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
On, Thu Apr 17, 2008, Brian Fisher wrote: On Thu, Apr 17, 2008 at 10:41 AM, Lenard Lindstrom [EMAIL PROTECTED] wrote: The documentation is vague on this. It just says that BLEND_ flags are optional. Nothing is mentioned about them being ignored. Even if the docs explicitly said the special flags are ignored for surfaces with alpha I would still argue that it is useless and confusing behavior to ignore these flags in the case of being used with a surface with alpha, and the behavior (and docs) should be changed. If they use the flags they are [...] I guess we agree on that the current behaviour and docs are bad for this case. How would I expect it to work? BLEND_* behave like the current implementation but take the alpha into account as well, if it exists for both surfaces. If it does only exist for one, it will be ignored. Thus e.g. BLEND_ADD would create the following results: Both surfaces have alpha: (10, 20, 30, 40) + (20, 30, 40, 50) = (30, 50, 70, 90) Only one has alpha: (10, 20, 30) + (20, 30, 40, 50) = (30, 50, 70 [, 50]) In the latter case the alpha value will stay the same, if the destination has alpha, otherwise it will be silently ignored. If no blending flags are passed and both surfaces have alpha, the current alpha blending implementation will be used. Otherwise the blit will behave as usual. Is such a behaviour okay for anyone? Regards Marcus pgpUhnjj6u1E6.pgp Description: PGP signature
Re: [pygame] Python and Speed
On Apr 17, 2008, at 11:59 AM, Ian Mallett wrote: [..] This is precisely the problem I have run into in one of my in-dev games--iterating over large arrays once per frame. Actually, it is basically a collision detection algorithm--I have two arrays, both containing 3D points. The points in one array must be tested with the points in the other to see how close they are. If they are close enough, there is a collision. Naturally, this means that for every point in one array, the other array must be iterated through and the 3D pythagorean theorem performed to each tested point. Note this is not the most efficient way to do this, using a partitioned space you may be able to avoid comparing most points with one another most of the time. To do this in 2D you could use quad- trees, in 3D you could use oct-trees. See: http://en.wikipedia.org/wiki/Octree The easiest way to do this is to find a native library that already performs the operations you need. This seems like a rather uncommon function--I haven't found that which I need. Note ode already implements efficient 3D collision detection in naive code, I believe pymunk does this for 2D. pyode is a python wrapper for ode. FWIW, you would get better answers to your problems by asking more specific questions. If you had asked How do I make collision detection faster? you would have gotten much better answers than asking How do I make Python faster?. -Casey
Re: [pygame] Python and Speed
On Apr 17, 2008, at 12:15 PM, Casey Duncan wrote: Note ode already implements efficient 3D collision detection in naive code, I believe pymunk does this for 2D. pyode is a python wrapper for ode. heh, I meant to say native code 8^) -Casey
Re: [pygame] Python and Speed
On Thu, Apr 17, 2008 at 12:15 PM, Casey Duncan [EMAIL PROTECTED] wrote: Note this is not the most efficient way to do this, using a partitioned space you may be able to avoid comparing most points with one another most of the time. To do this in 2D you could use quad-trees, in 3D you could use oct-trees. See: http://en.wikipedia.org/wiki/Octree Yes, I've tried this, but there are issues with points being in two separate places. For example, if the collision radius is 5, and it is 3 away from the edge, then all the points in the neighboring trees must be tested. http://en.wikipedia.org/wiki/Octree Note ode already implements efficient 3D collision detection in naive code, I believe pymunk does this for 2D. pyode is a python wrapper for ode. I'll look into it. FWIW, you would get better answers to your problems by asking more specific questions. If you had asked How do I make collision detection faster? you would have gotten much better answers than asking How do I make Python faster?. Well, my question was actually how can Python be made faster? The collision detection was an example of where it is a problem. -Casey Ian
Re: [pygame] Python and Speed
On Wed, Apr 16, 2008 at 7:30 PM, Ian Mallett [EMAIL PROTECTED] wrote: Thus it falls to you as a developer to choose your implementation strategy wisely: But again, this is treating the symptoms, not the problem... I actually think the line of thinking I read in the comment above (thinking that I shouldn't have to optimize things, because the stupid language is slow) is in fact a counterproductive attitude in application developement, and misses the real point. It would be nice of course if python ran much much faster, but it runs slow largely because it is designed to give you the flexibility to code complex things much easier. If you don't want that flexibility, then you need to turn it off with pyrex and extensions and all that kind of stuff. However sometimes that flexibility actually lets you code more efficient approaches to begin with. Ultimately all slowness is the programmers problem, not the tools. If a particular tool is the best to help you solve the problem, then it should be used. With python, coolness is always on, so it's cheap to use coolness. C++ was designed to make you not pay for anything you don't use, which means coolness is default off, which means it's really hard to use coolness. ...to get to brass tacks though, I've found that the majority of the real slowness in _anything_ I code is due to the approach I take, and much less so due to the code. For example, pairwise collision between a set of objects. If every object needs to be checked against every object, that's an n-squared problem. Get 1000 items, that's 1,000,000 collision checks. But lets say I do object partitioning, or bucketing or something where I maintain sortings of the objects in a way that lets me only check items against ones that are close to it, and I either get log(n) partitioning or maybe I get at most about 10 items per bucket (both very achieveable goals). Now it means I only do about 10,000 (10*1000) collision checks for the same real work being done. So lets say that my python collision code takes 100 times as long as my c++ collision code - that means if I do the optimization in python, I can get the python code to go just as fast as the C code without the optimization. Not only that - lets say I decide I want to step stuff up to 10,000 items with pairwise collision - now it's 100,000,000 checks vs. like say 100,000 based on the approach - now python can actually be 10 times faster. So now the issue becomes whats the cost of writing the more efficient approach in python code vs. writing the naive approach in c++ code. If you think you get enough programmer benefits from working in python to make those 2 costs equal, and the performance of either is good enough, python is the better choice. Not only that, once you've got good approaches written in python that are stable and you don't need the coolness/flexibility, it becomes much easier to port the stuff to C, or pyrex it or whatever makes it much, much faster. On Thu, Apr 17, 2008 at 11:59 AM, Ian Mallett [EMAIL PROTECTED] wrote: [casey talked about complexity] This is precisely the problem I have run into in one of my in-dev games--iterating over large arrays once per frame. Actually, it is basically a collision detection algorithm--I have two arrays, both containing 3D points. The points in one array must be tested with the points in the other to see how close they are. If they are close enough, there is a collision. Naturally, this means that for every point in one array, the other array must be iterated through and the 3D pythagorean theorem performed to each tested point. Sounds like your approach is O(N^2). If most points aren't close enough to do the collision, partitioning to make it so you don't even have to do the check will do wonders.
Re: [pygame] Python and Speed
On Apr 17, 2008, at 12:26 PM, Ian Mallett wrote: On Thu, Apr 17, 2008 at 12:15 PM, Casey Duncan [EMAIL PROTECTED] wrote:Note this is not the most efficient way to do this, using a partitioned space you may be able to avoid comparing most points with one another most of the time. To do this in 2D you could use quad-trees, in 3D you could use oct-trees. See: http://en.wikipedia.org/wiki/Octree Yes, I've tried this, but there are issues with points being in two separate places. For example, if the collision radius is 5, and it is 3 away from the edge, then all the points in the neighboring trees must be tested. Partitioned space is certainly a more complex algorithm, but so long as all of your points (spheres?) are not close together, it is usually vastly more efficient. If the partition size is optimal, than the vast majority of particles will not be hitting the edge of a partition, that will be an edge-case (pun intended). Even for those that are it's usually still faster than the naive O(N^2) method that compares every point with every other. This algorithm is only effective if the space is large relative to the collision geometries and they tend not to be clumped very close together. -Casey
Re: [pygame] Python and Speed
On Thu, Apr 17, 2008 at 12:39 PM, Brian Fisher [EMAIL PROTECTED] wrote: I actually think the line of thinking I read in the comment above (thinking that I shouldn't have to optimize things, because the stupid language is slow) is in fact a counterproductive attitude in application developement, and misses the real point. Obviously it is the problem of the programmer--it is the programmer who programs his program, not Python. Python just executes it. But the fact is, it makes a programmer's job easier if he has unlimited power to work with. Currently, I find myself having to stretch Python's limits, and, as you say, find optimizations. Programming is fun, but rewriting code so that you can meet a reasonable performance benchmark is not. It would be nice of course if python ran much much faster, but it runs slow largely because it is designed to give you the flexibility to code complex things much easier. I like this aspect of Python--its flexibility, but I object to the lack of speed. I want it all--flexibility and speed. If you don't want that flexibility, then you need to turn it off with pyrex and extensions and all that kind of stuff. I actually can't think of any situation where I would really need to do that. However sometimes that flexibility actually lets you code more efficient approaches to begin with. ...because the code is more clear. Better looking code usually runs faster, because clear code allows one to see any performance sucking bugs... Ultimately all slowness is the programmers problem, not the tools[']. Of course. The programmer is the one who makes the program. The users would complain to the programmer, not Python, and, uh, they do. If a particular tool is the best to help you solve the problem, then it should be used. With python, coolness is always on, so it's cheap to use coolness. C++ was designed to make you not pay for anything you don't use, which means coolness is default off, which means it's really hard to use coolness. ...to get to brass tacks though, I've found that the majority of the real slowness in _anything_ I code is due to the approach I take, and much less so due to the code. For example, pairwise collision between a set of objects. If every object needs to be checked against every object, that's an n-squared problem. Get 1000 items, that's 1,000,000 collision checks. But lets say I do object partitioning, or bucketing or something where I maintain sortings of the objects in a way that lets me only check items against ones that are close to it, and I either get log(n) partitioning or maybe I get at most about 10 items per bucket (both very achieveable goals). Now it means I only do about 10,000 (10*1000) collision checks for the same real work being done. This is work that must be done. To do this in my case would be somewhat complicated, as I would need interpartition testing, boundary testing on the partitions and on each point, and various other modifications. Of course the code could be made faster, but this is something that I would have to do to get this program functioning at a good speed. Why not make Python faster, making such annoying modifications unnecessary and speeding up all Python in the process? So lets say that my python collision code takes 100 times as long as my c++ collision code - that means if I do the optimization in python, I can get the python code to go just as fast as the C code without the optimization. Not only that - lets say I decide I want to step stuff up to 10,000 items with pairwise collision - now it's 100,000,000 checks vs. like say 100,000 based on the approach - now python can actually be 10 times faster. That's an optimization which takes time and effort to implement. A C programmer very often has no need to do such optimizations, though he works with code I find horrid by comparison. So now the issue becomes whats the cost of writing the more efficient approach in python code vs. writing the naive approach in c++ code. If you think you get enough programmer benefits from working in python to make those 2 costs equal, and the performance of either is good enough, python is the better choice. Not only that, once you've got good approaches written in python that are stable and you don't need the coolness/flexibility, it becomes much easier to port the stuff to C, or pyrex it or whatever makes it much, much faster. The whole point of using Python, for me, is that it is far more flexible and programmer-friendly than anything else I've seen. I don't want to have to make a choice between Python and C just on a matter of speed--Python should be the clear choice. They should be equal in speed, but Python is easier to use. Obvious choice? Python. Sounds like your approach is O(N^2). If most points aren't close enough to do the collision, partitioning to make it so you don't even have to do the check will do wonders. Again, this problem is merely an
Re: [pygame] Python and Speed
Hi! I am just making an observation on this and objects, maybe I am missing the point, but when checking collisions, if you know your objects size, the vertex, or point depending on direction, could you not not solve this by just the direct line between the 2 objects and not the surface? What I mean, you are in control of your game, you know your objects, thus, you also know the point that will be hit depending on the direction you are traveling in, so why not just check that point or small collection of points? That is what I would do when running this kind of game. For my Star Trek game does not use the actual pixel, but the area in which it is in and that is a larger area. But when using pixel points, then only the points that fall in-line with the direction you are traveling in. That seems to me to a much faster check, little to almost 1 single point is what seems to be the result in this... In other words you know your objects and where they are, then just draw a straight line between them and calculate the edge at that line drawn. I am not saying draw a line, just calculate to the edge of the object from both... Bruce On Apr 17, 2008, at 12:26 PM, Ian Mallett wrote: On Thu, Apr 17, 2008 at 12:15 PM, Casey Duncan [EMAIL PROTECTED] wrote:Note this is not the most efficient way to do this, using a partitioned space you may be able to avoid comparing most points with one another most of the time. To do this in 2D you could use quad-trees, in 3D you could use oct-trees. See: http://en.wikipedia.org/wiki/Octree Yes, I've tried this, but there are issues with points being in two separate places. For example, if the collision radius is 5, and it is 3 away from the edge, then all the points in the neighboring trees must be tested. Partitioned space is certainly a more complex algorithm, but so long as all of your points (spheres?) are not close together, it is usually vastly more efficient. If the partition size is optimal, than the vast majority of particles will not be hitting the edge of a partition, that will be an edge-case (pun intended). Even for those that are it's usually still faster than the naive O(N^2) method that compares every point with every other. This algorithm is only effective if the space is large relative to the collision geometries and they tend not to be clumped very close together. -Casey
Re: [pygame] Python and Speed
On Thu, Apr 17, 2008 at 12:45 PM, Casey Duncan [EMAIL PROTECTED] wrote: Partitioned space is certainly a more complex algorithm, but so long as all of your points (spheres?) Yes, one can think of one array as points and the other as spheres. are not close together, it is usually vastly more efficient. If the partition size is optimal, than the vast majority of particles will not be hitting the edge of a partition, that will be an edge-case (pun intended). Even for those that are it's usually still faster than the naive O(N^2) method that compares every point with every other. This algorithm is only effective if the space is large relative to the collision geometries and they tend not to be clumped very close together. In 3D space, there is a good deal of room, so I can see this being effective. But again, this is only one example of countless slowdowns I've had. Let's fix all of them in one fell swoop, no? The program is slowing down because the computer is processing the program relatively inefficiently. This is due to Python. Programmers shouldn't have to optimize their inefficiently executed code, the code should just be executed efficiently. -Casey Ian
Re: [pygame] Python and Speed
I'm not sure precisely what you mean... Again, remember that this is an example, not the question. The question is How can Python be made Faster? This is an example of one of the problems resulting from Python's relative slowness. Here's the example: -There is a list of 3D points -There is another list of 3D points. -Every frame, for every point in the first list, if any point in the second list is a certain 3D distance away, then there is a collision. Ian
Re: [pygame] Python and Speed
In that specific case, no matter which programming language you use, your code will not be very fast. Do you think programmers write unoptomized code in c, and get speedy execution every time? Have you not ever used a program or played a game which ran slower than it should, which was actually programmed in a fast language? Optiomizing code and improving algorithms have been around far longer than python, and are an important part of programming in general. As has been mentioned before, there have been many attempts to optimize core python, which have resulted in some improvements. Python2.5 is considerably faster than python1.0. However, due to the nature of the language it can only be optimized so much. The best bet really, is to write code in c that needs to be fast, and call that code from python, using python as a glue language. This can be accomplished using implementations that already exist (pyode, numpy) or writing a new implementation and exposing it with pyrex or other linking programs. There is no magic bullet that will make python faster, and it's not for lack of trying. Even at it's most optimized that could theoretically be done, I don't think pure python will ever be as fast as c. I do hope it gets close, but even if this were to be the case, your collision detection code will still be slow as heck. Culling algorithms for this purpose were invented to speed up applications written in C after all :)
Re: [pygame] Python and Speed
On Thu, 17 Apr 2008, you wrote: No, this is the place to discuss it because if we wish to make games, work with existing platforms, and want speed, that is the way to go. Now that we have had this discussion, and found solutions, now we have a list of ways to resolve it. Sure, discuss ways to work with the current interpreter, but it's quite pointless talking about changes to the interpreter itself. I'm almost certain none of the core devs are on this list, and the python-dev list is a much more appropriate place to discuss it anyway :) Richard
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
Marcus von Appen wrote: Should we change this to take alpha into account for the BLEND_* args as well and only apply the default alpha blending, if no BLEND_* args are passed? Maybe what's needed is another set of flags to specify what transfer function to use for the alpha channel, independently of the RGB channels. A good place to look for ideas on this would be the set of blending functions specified by OpenGL. It seems to be fairly comprehensive and is based on a lot of experience of what people have found useful. In particular, it has modes which specify the alpha combining function separately, which seems to be what is needed here. -- Greg
Re: [pygame] Python and Speed
Patrick Mullen wrote: Also, if you are using sqrt for your distance check, you are likely wasting cpu cycles, if all you need to know is whether they are close enough. Also note that if you do need to compare exact Euclidean distances for some reason, you can avoid square roots by comparing the squares of the distances instead of the distances themselves. -- Greg
Re: [pygame] Python and Speed
Ian Mallett wrote: Here's the example: -There is a list of 3D points -There is another list of 3D points. -Every frame, for every point in the first list, if any point in the second list is a certain 3D distance away, then there is a collision. The responses to it also provide a good example of a very important principle: Often, using a better algorithm can give you much greater gains than speeding up the implementation of the one you're using. So when something is too slow, the first thing you should ask is: Does there exist a better algorithm for what I'm trying to do? -- Greg
Re: [pygame] Python and Speed
René Dudfield wrote: - SIMD instructions are the fast ones... It's doubtful there's much in the Python core that would benefit from SIMD, though. Most of what it does doesn't involve doing repetitive operations on big blocks of data. -- Greg
Re: [pygame] Python and Speed
Ian Mallett wrote: How do you write an extension module in C and call it from Python? Someone gave some instructions earlier, but I found them too vague... Another way I forgot to mention earlier is to use the ctypes module (I often forget about it, because it wasn't in the stdlib until very recently.) That allows you to call compiled routines from a shared library directly without having to write any C. It's less efficient, though, as it has to go through some Python wrapper objects to get there, and also more dangerous, because you can easily crash the interpreter if you don't get everything exactly right. -- Greg
Re: [pygame] Python and Speed
Ian Mallett wrote: Yes, I've tried this, but there are issues with points being in two separate places. For example, if the collision radius is 5, and it is 3 away from the edge, then all the points in the neighboring trees must be tested. Rather than a tree, you may be just as well off using a regular array of cells. That makes it much easier to find the neighbouring cells to test, and there's also less overhead from code to manage and traverse the data structure. The only time you would really need a tree is if the distribution of the objects can be very clumpy, so that you benefit from an adaptive subdivision of the space. Another possibility to consider is instead of testing neighbouring cells, insert each object into all cells that are within the collision radius of it. That might turn out to be faster if the objects don't move very frequently. -- Greg
Re: [pygame] Python and Speed
ctypes is great stuff! I find it much harder to crash the interpreter with ctypes than with extensions I've developed and debugged. It is quite resilient. I've used it to interface with the Windows API to simulate keystrokes, to interface to a USB Digital IO interface, in a wrapper for the huge OpenCV library, to set the background image on my desktop, to adjust the system volume control, to interface to the Wiimote, and to wrap eSpeak to name a few. I've pretty much given up on swig and pyrex. gb Greg Ewing wrote: Ian Mallett wrote: How do you write an extension module in C and call it from Python? Someone gave some instructions earlier, but I found them too vague... Another way I forgot to mention earlier is to use the ctypes module (I often forget about it, because it wasn't in the stdlib until very recently.) That allows you to call compiled routines from a shared library directly without having to write any C. It's less efficient, though, as it has to go through some Python wrapper objects to get there, and also more dangerous, because you can easily crash the interpreter if you don't get everything exactly right.
Re: [pygame] Python and Speed
Hi! Another thought, same as before but adding the other comment about bins. If your object is heading in a certain direction and you know the surface point of that object, now make a dictionary key point for it. Same for all other objects, knowing there direction. key=str(x)+str(-y)+str(-z) #Keeping all as integer values in a coordinate grid. The only thing you compare are the keys. Then you will say, wait a minute, not both are going to collide at that point! OK, then you know the direction of both and the point where they would in fact collide. Now that point will be approaching both at the same point in that straight line. The vertex of the point may change where they intersect, but keep taking the key value of that point because both will always have that vertex point on that point on the surface. Keep updating the dictionary key and compare the value for both with the if key in... That point is a key value and easy to check. It is not the surface, just the intersection point of both straight lines of the object traveling through free space... So you will be checking 2 keys for 2 objects, 3 keys for 3 objects, and so on... The only addition to this is if you have an object that is not perfect shape, like a boulder, where that outer edge will change depending on the angle of your straight line to the vertex or intersection point. Not checking surfaces, just checking the intersection point of the line, for both will have to meet and the distance to that objects surface will also be that straight line distance for the object center, which will still direct you to the vertex/intersection point. Both surface points that meet, have same key value, will also be the collision point when they do in fact collide... So all points will match, just another observation and thought in this faster and faster check of objects. For I do something like this with my primitive Battleship game. Instead of an array, just keys for where an object part is located. When the missile or shell hits that key it matches the objects location at that same key. No completely filled huge array, just points. When doing this you have up to 3 points to calculate: point of surface of object 1, point on surface of object 2 and then the intersection point of the vector of both objects. When all 3 points match with the same key, you have your collision! Like I said before, the updating is the vector angle, surface location and the intersection point of that straight line of the vectors of both objects. Bruce René Dudfield wrote: - SIMD instructions are the fast ones... It's doubtful there's much in the Python core that would benefit from SIMD, though. Most of what it does doesn't involve doing repetitive operations on big blocks of data. -- Greg
Re: [pygame] Python and Speed
Ian Mallett wrote: Programmers shouldn't have to optimize their inefficiently executed code, the code should just be executed efficiently. Even if your inefficient algorithm is being executed as fast as possible, it's still an inefficient algorithm, and you will run into its limitations with a large enough data set. Then you will have to find a better algorithm anyway. Part of the skill of being a good programmer is having the foresight to see where such performance problems are likely to turn up further down the track, and choosing an algorithm at the outset that at least isn't going to be spectacularly bad. Doing this saves you work in the long run, since you spend less time going back and re-coding things. -- Greg
Re: [pygame] Python and Speed
Devon Scott-Tunkin wrote: would you set out 2d partitioning the screen in pygame by making say 4 invisible sprites with rects ... or by just using x,y values Just use coordinates. Sprites are totally unnecessary, since they're not something that appears on the screen. -- Greg
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
On Fri, Apr 18, 2008 at 7:36 AM, Greg Ewing [EMAIL PROTECTED] wrote: Marcus von Appen wrote: Maybe what's needed is another set of flags to specify what transfer function to use for the alpha channel, independently of the RGB channels. +1
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
Brian Fisher wrote: On Thu, Apr 17, 2008 at 10:41 AM, Lenard Lindstrom [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: For alpha surfaces a blend may be useful for doing shadow effects. Such effects require extracting and altering the alpha plane. Want to make a shadow for antialiased text. Extract the alpha and apply to an all black surface. Right now surfarray must be used to do this in a practical way. Otherwise it would have to be done per-pixel with get_at/set_at. I'm curious - how exactly would you expect to be able to extract the alpha channel from a font char and apply it to a black surface using the blit flags? BLEND_MAX? Is there a technique that would also work for a surface with arbitrary color values? I was thinking of this: text = font.render(, True, some_color) alpha = pygame.Surface(text.get_size(), pgyame.SRCALPHA, 32) alpha.fill((0, 0, 0, 255)) alpha.blit(text, (0, 0), None, pygame.BLEND_MIN) This assumes BLEND_MIN applies to alpha as well. If dA = sA + dA - ((sA*dA)/255) then a destination color of (0, 0, 0, 0) will work. Source color is irrelevant. It extras alpha from any image. -- Lenard Lindstrom [EMAIL PROTECTED]
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
Also, I guess the docs should be updated to specify which blend operations the flags do exactly. On Fri, Apr 18, 2008 at 10:49 AM, Lenard Lindstrom [EMAIL PROTECTED] wrote: Brian Fisher wrote: On Thu, Apr 17, 2008 at 10:41 AM, Lenard Lindstrom [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: For alpha surfaces a blend may be useful for doing shadow effects. Such effects require extracting and altering the alpha plane. Want to make a shadow for antialiased text. Extract the alpha and apply to an all black surface. Right now surfarray must be used to do this in a practical way. Otherwise it would have to be done per-pixel with get_at/set_at. I'm curious - how exactly would you expect to be able to extract the alpha channel from a font char and apply it to a black surface using the blit flags? BLEND_MAX? Is there a technique that would also work for a surface with arbitrary color values? I was thinking of this: text = font.render(, True, some_color) alpha = pygame.Surface(text.get_size(), pgyame.SRCALPHA, 32) alpha.fill((0, 0, 0, 255)) alpha.blit(text, (0, 0), None, pygame.BLEND_MIN) This assumes BLEND_MIN applies to alpha as well. If dA = sA + dA - ((sA*dA)/255) then a destination color of (0, 0, 0, 0) will work. Source color is irrelevant. It extras alpha from any image. -- Lenard Lindstrom [EMAIL PROTECTED]
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
René Dudfield wrote: On Fri, Apr 18, 2008 at 7:36 AM, Greg Ewing [EMAIL PROTECTED] wrote: Marcus von Appen wrote: Maybe what's needed is another set of flags to specify what transfer function to use for the alpha channel, independently of the RGB channels. +1 This means inlined blending code will have to be replaced with indirect function calls. Otherwise, with all the possible permutations of blending, alphablit.c will balloon in size uncontrollably. -- Lenard Lindstrom [EMAIL PROTECTED]
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
Lenard Lindstrom wrote: This means inlined blending code will have to be replaced with indirect function calls. Otherwise, with all the possible permutations of blending, alphablit.c will balloon in size uncontrollably. You could make two passes over the surface, one to update the RGB and one to update the alpha. Or you could have a set of blending operations that update the alpha only, and let the user make two blit calls. -- Greg
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
René Dudfield wrote: ... or we use a runtime assembler to generate the required blitters at run time ;) Or reimplement pygame on top of OpenGL... Isn't someone already working on that? -- Greg
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
hi, yeah, you can do pygame on top of opengl in a few different ways. SDL 1.3/2 has opengl support built in. Also you can use things like lamina (http://pitchersduel.python-hosting.com/browser/branches/Lamina/) to use opengl with SDL 1.2.x. But, obviously opengl sucks in certain ways - like it doesn't exist at all on some computers, or is buggy - which is why we use software. So I guess this discussion is about improving the software blitters :) cu, On Fri, Apr 18, 2008 at 11:25 AM, Greg Ewing [EMAIL PROTECTED] wrote: René Dudfield wrote: ... or we use a runtime assembler to generate the required blitters at run time ;) Or reimplement pygame on top of OpenGL... Isn't someone already working on that? -- Greg
Re: [pygame] Unexpect blit behavior when using BLEND_ADD
Greg Ewing wrote: Lenard Lindstrom wrote: This means inlined blending code will have to be replaced with indirect function calls. Otherwise, with all the possible permutations of blending, alphablit.c will balloon in size uncontrollably. You could make two passes over the surface, one to update the RGB and one to update the alpha. Or you could have a set of blending operations that update the alpha only, and let the user make two blit calls. Having a separate blit_alpha might be the way to go. This may actually remove intermediary steps involving an alpha-only surface. Also, it can be abstracted into a general purpose color plane toolkit. While PixelArray is good at carving up a surface it doesn't replace Numeric or NumPy for doing blanket operations on a surface's contents. -- Lenard Lindstrom [EMAIL PROTECTED]
Re: [pygame] Python and Speed
On Thu, Apr 17, 2008 at 3:03 PM, Greg Ewing [EMAIL PROTECTED] wrote: Patrick Mullen wrote: Also, if you are using sqrt for your distance check, you are likely wasting cpu cycles, if all you need to know is whether they are close enough. Nope; in this case, the calculations' margin of error must be very small. Also note that if you do need to compare exact Euclidean distances for some reason, you can avoid square roots by comparing the squares of the distances instead of the distances themselves. I already do that. On Thu, Apr 17, 2008 at 4:02 PM, Greg Ewing [EMAIL PROTECTED] wrote: Rather than a tree, you may be just as well off using a regular array of cells. That makes it much easier to find the neighbouring cells to test, and there's also less overhead from code to manage and traverse the data structure. I meant that. What is the difference between a tree and a cell? Cells are regular? Anyway, I had planned to do cells. The only time you would really need a tree is if the distribution of the objects can be very clumpy, so that you benefit from an adaptive subdivision of the space. Another possibility to consider is instead of testing neighbouring cells, insert each object into all cells that are within the collision radius of it. That might turn out to be faster if the objects don't move very frequently. I like that idea. Still, the objects can and do move. On Thu, Apr 17, 2008 at 4:23 PM, Greg Ewing [EMAIL PROTECTED] wrote: There are an extremely large number of modifications that could be made to Python. Only a very small number of them will result in any improvement, and of those, all the easy-to-find ones have already been found. The harder ones must then be attacked--solving a difficult speed issue might save the tiresome implementation of optimizations on the part of hundreds of users. If you want to refute that, you're going to have to come up with an actual, specific proposal, preferably in the form of a code patch together with a benchmark that demonstrates the improvement. If you can't do that, you're not really in a position to make statements like it can't be that hard. Like I said, because I am the programmer, not the Python modifier, it is my job to make the programs run fast. By can't be that hard, I mean that if C++ can do it, Python should be able to too. Obviously, I don't know how Python is structured, and I doubt I have the experience of the people on the Python team, but if I can make optimizations in my code, they should be able to make modifications in Python. If wishing could make it so, Python would already be blazingly fast! Ian is wishing... On Thu, Apr 17, 2008 at 4:32 PM, Greg Ewing [EMAIL PROTECTED] wrote: Even if your inefficient algorithm is being executed as fast as possible, it's still an inefficient algorithm, and you will run into its limitations with a large enough data set. Then you will have to find a better algorithm anyway. Part of the skill of being a good programmer is having the foresight to see where such performance problems are likely to turn up further down the track, and choosing an algorithm at the outset that at least isn't going to be spectacularly bad. Doing this saves you work in the long run, since you spend less time going back and re-coding things. Not necessarily. I've had situations where I've decided to do something, then draft-coded it, then decided that the game feature wasn't necessary, was the wrong approach, or simply to scrap the entire project. If I had decided to spend the time to code something with all of the optimizations, it would take longer. When I delete it, that extra effort would have been worthless. Once a feature is implemented, tested, and so on, I decide if it is needed, then add optimizations and comments. Because optimizations often rewrite code in a more efficient way, the original code works as a guideline for the operation. All this saves me effort in the long-run; but I can't speak for anyone else. Greg Ian
Re: Re: [pygame] Python and Speed
Ian Mallett [EMAIL PROTECTED] wrote: ... if C++ can do it, Python should be able to too. Obviously, I don't know how Python is structured ... Then please learn more about CPython's internals and the general problem of optimising a dynamic language like Python. The CPython code is incredibly well-written and easy to understand, even with the various complications that exist due to current optimisations* and there's plenty of academic research into dynamic languages out there. I'm sure the core devs would welcome your patches with open arms! I'll leave you with Greg's wisdom, which perhaps needs repeating: If wishing could make it so, Python would already be blazingly fast! Richard * for example the double dict-lookup implementations that seamlessly replace the string-only lookup with an arbitrary-object lookup on the first non-string lookup, or the function-call frame caching mechanism that I had a hand in implementing...