Re: [pypy-dev] Pypy's facelift and certificate issues
http://packages.pypy.org/ Question about the web site -- does PyPy currently have anything similar to > this page for Python 3? > http://py3readiness.org/ > > I think a page like this, showing which major libraries are compatible > with PyPy, could really help drive adoption of PyPy. I know for our team, > the Python 3 page was a strong reason we felt "safe" starting to make the > switch to Python 3. > I'm not sure how we'd get this information about PyPy library > compatibility. One idea would be to install each library on PyPy, run the > automated tests, and compare the results against those for CPython. > Barry > > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year
It turns out there is some work in progress in the Spark project to share its memory with non JVM programs. See https://issues.apache.org/jira/browse/SPARK-10399. Once this is completed it should be fairly trivial to expose it to Python and then maybe JIT integration could be discussed at that time. This is a huge step forward over sharing Java objects. From the title of the ticket it appears it would be a c++ interface but looking at the pull request it looks like it will be a c interface. In the end the blocker may just come down to PyPy having complete support for Numpy. Without Numpy the success of this would be somewhat limited based on user expectations and without PyPy it maybe to slow for many applications. On Thu, Mar 24, 2016 at 1:11 PM, John Camara wrote: > Hi Armin, > > At a minimum tighter execution is required as well as sharing memory. But > on the other hand you have raised the bar so high with cffi, having a clean > and unbloated interface, that it would be nice if a library with a similar > spirit existed for java. Having support in PyPy's JIT to remove all the > marshalling types would be a big plus on top of the shared memory as well > as some integration between the 2 GCs would likely be required. > > Maybe the best approach would be a combination of existing libraries and a > new interface that allows for sharing of memory. Maybe similar to numpy > arrays with a better API that avoids the pit falls of numpy relying on > CPython semantics/implementation details. After all the only thing that > needs to be eliminated is the copying/serialization of large data > arrays/structures. > > John > > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year
Hi Armin, At a minimum tighter execution is required as well as sharing memory. But on the other hand you have raised the bar so high with cffi, having a clean and unbloated interface, that it would be nice if a library with a similar spirit existed for java. Having support in PyPy's JIT to remove all the marshalling types would be a big plus on top of the shared memory as well as some integration between the 2 GCs would likely be required. Maybe the best approach would be a combination of existing libraries and a new interface that allows for sharing of memory. Maybe similar to numpy arrays with a better API that avoids the pit falls of numpy relying on CPython semantics/implementation details. After all the only thing that needs to be eliminated is the copying/serialization of large data arrays/structures. John On Thu, Mar 24, 2016 at 12:20 PM, Armin Rigo wrote: > Hi John, > > On 24 March 2016 at 13:22, John Camara wrote: > > (...) Thus the need for a jffi library. > > When I hear "a jffi library" I'm thinking about a new library with a > new API. I think what you would really like instead is to keep the > existing libraries, but adapt them internally to allow tighter > execution of the Python and Java VMs. > > I may be completely wrong about that, but you're also talking to the > wrong guys in the first place :-) > > > A bientôt, > > Armin. > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year
Hi Fijal, I understand where your coming from and not trying to convince you to work on it. Just mainly trying to point out a need that may not be obvious to this community. I don't spend much time on big data and analytics so I don't have a lot of time to devote to this task. That could change in the future so you never know I may end up getting involved with this. At the end of the day I think it is the PSF, which needs to do an honest assessment of the current state of Python and in programming in general, so that they can help direct the future of Python. I think with an honest assessment it should be clear that it is absolutely necessary that a dynamic language have a JIT. Otherwise, a language like Node would not be growing so quickly on the server side. An honest assessment would conclude that Python needs to play a major role in big data and analytics as we don't want this to be another area where Python misses the boat. As with all languages other than JavaScript we missed playing an important role on web front end. More recently we missed out on mobile. I don't think it is good for us to miss out on big data. It would be a shame since we had such a strong scientific community which initially gave us a huge advantage over other communities. Missing out on big data might also be the driver that moves the scientific community in a different direction which would be a big loss to Python. I personally don't see any particular companies or industries that are willing to fund the tasks needed to solve these issues. It's not to say there are no more funds for Python projects its just likely no one company will be willing to fund these kinds of projects on their own. It really needs the PSF to coordinate these efforts but they seamed to be more focus on trying to make Python 3 a success instead of improving the overall health of the community. I believe that Python is in pretty good shape in being able to solve these issues but it just needs some funding and focus to get there. Hopefully the workshop will be successful and help create some focus. John On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski wrote: > Hi John > > Thanks for explaining the current situation of the ecosystem. I'm not > quite sure what your intention is. PyPy (and CPython) is very easy to > embed through any C-level API, especially with the latest additions to > cffi embedding. If someone feels like doing the work to share stuff > that way (as I presume a lot of data presented in JVM can be > represented as some pointer and shape how to access it), then he's > obviously more than free to do so, I'm even willing to help with that. > Now this seems like a medium-to-big size project that additionally > will require quite a bit of community will to endorse. Are you willing > to volunteer to work on such a project and dedicate a lot of time to > it? If not, then there is no way you can convince us to volunteer our > own time to do it - it's just too big and quite a bit far out of our > usual areas of interest. If there is some commercial interest (and I > think there might be) in pushing python and especially pypy further in > that area, we might want to have a better story for numpy first, but > then feel free to send those corporate interest people my way, we can > maybe organize something. If you want us to do community service to > push Python solutions in the area I have very little clue about > however, I would like to politely decline. > > Cheers, > fijal > > On Thu, Mar 24, 2016 at 2:22 PM, John Camara > wrote: > > Besides JPype and PyJNIus there is also https://www.py4j.org/. I > haven't > > heard of JPype being used in any recent projects so I assuming it is > > outdated by now. PyJNIus gets used but I tend to only see it used on > > Android projects. The Py4J project gets used often in > numerical/scientific > > projects mainly due to it use in PySpark. The problem with all these > > libraries is that they don't have a way to share large amounts of memory > > between the JVM and Python VMs and so large chunks of data have to be > > copied/serialized when going between the 2 VMs. > > > > Spark is the de facto standard in clustering computing at this point in > > time. At a high level Spark executes code that is distributed > throughout a > > cluster so that the code being executed is as close as possible to where > the > > data lives so as to minimize transferring of large amounts of data. The > > code that needs to be executed are packaged up into units called > Resilient > > Distributed Dataset (RDD). RDDs are lazy evaluated and are essential > graphs > > of the operations that need to be performed on the data. They are > capable > >
Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year
. What it strongly lacks today is the connection to C/legacy code, numerical/scientific modules and of course it also does not have a solution to the data copying overhead it also has with the JVM. Any way, this is just my 2 cents on what is currently holding Python back from taking off in this space. On Thu, Mar 24, 2016 at 2:32 AM, Hakan Ardo wrote: > > On Mar 23, 2016 21:49, "Armin Rigo" wrote: > > > > Hi John, > > > > On 23 March 2016 at 19:16, John Camara wrote: > > > I would like to suggest one more topic for the workshop. I see a big > need > > > for a library (jffi) similar to cffi but that provides a bridge to Java > > > instead of C code. The ability to seamlessly work with native Java > data/code > > > would offer a huge improvement (...) > > > > Isn't it what JPype does? Can you describe how it isn't suitable for > > your needs? > > There is also PyJNIus: > > https://pyjnius.readthedocs.org/en/latest/ > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year
Hi Fijal, I agree that jffi would be both a large project and without someone leading it, it would likely not get any where. But I tend to disagree that it would be a separate goal for the conference. I realize the goal of the summit is to talk about native-code compilation for Python and most would argue that means executing C code, assembly, or at the very least executing code at the speed of "C code". But the reality now is, numerical/scientific programming increasingly needs executing in a clustered environment. So I think we need to be careful to not only solve yesterday's problems but make sure we are covering the current day and future ones. Today, big data and analytics, which is driving most numerical/scientific programming, is becoming almost exclusively run in a clustered environment, with the Apache Spark ecosystem as the de facto standard. A few years back, Python's ace up its sleeve for the scientific community was the numpy/scipy ecosystem but we have recently lost that edge by falling behind in clustered computing. At this point in time our best move forward on the numerical/scientific fronts is to become best buddies with the Spark ecosystem and make sure we can bring bridge the numpy/scipy ecosystem to it. That is we merge the best of both worlds and suddenly Python becomes to go to language again for numerical/scientific computing. Of course we still need to address what should have been yesterday's problem and deal with the "native-code compilation" issues. John On Wed, Mar 23, 2016 at 2:47 PM, Maciej Fijalkowski wrote: > Hi John > > I understand why you're bringing this up, but it's a huge project on > it's own, worth at least a couple months worth of work. Without a > dedicated effort from someone I'm worried it would not go anywhere. > It's kind of separated from the other goals of the summit > > On Wed, Mar 23, 2016 at 8:16 PM, John Camara > wrote: > > Hi Nathaniel, > > > > I would like to suggest one more topic for the workshop. I see a big need > > for a library (jffi) similar to cffi but that provides a bridge to Java > > instead of C code. The ability to seamlessly work with native Java > data/code > > would offer a huge improvement when python code needs to work with the > > Spark/Hadoop ecosystem. The current mechanisms which involve serializing > > data to/from Java can kill performance for some applications and can > render > > Python unsuitable for these cases. > > > > John > > > > ___ > > pypy-dev mailing list > > pypy-dev@python.org > > https://mail.python.org/mailman/listinfo/pypy-dev > > > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
[pypy-dev] [ANN] Python compilers workshop at SciPy this year
Hi Nathaniel, I would like to suggest one more topic for the workshop. I see a big need for a library (jffi) similar to cffi but that provides a bridge to Java instead of C code. The ability to seamlessly work with native Java data/code would offer a huge improvement when python code needs to work with the Spark/Hadoop ecosystem. The current mechanisms which involve serializing data to/from Java can kill performance for some applications and can render Python unsuitable for these cases. John ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] vmprof compression
I meant to mention them in my email as both of them are great options when you don't mind sacrificing some compression for significant improvements in compression and decompression speeds. These libraries are I/O bound when saving to a hard drive unless you are using a very low powered processor. Generally the compression ratio is 0.5-0.75 of that achieved by gzip. Compression speeds can approach 0.5 GB/s. These libraries don't offer any advance compression techniques so anything you do help create long strings of 0s and 1s like compressing the deltas like I mention in the earlier email will go a long way at significantly improving the compression ratio while also maintaining high performance. On Fri, Mar 27, 2015 at 9:15 PM, Leonardo Santagada wrote: > snappy and lz4 are good algos to try too. > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
[pypy-dev] vmprof compression
Hi Fijal, To recap and continue the discussion from irc. We already discussed that the stack id are based on a counter which is good but I also want to confirm that the ids have locality associated with the code. That is similar areas of the code will have similar ids. Just to make sure are not random with respect to the code otherwise compression will not be helpful. If the ids are random that would need to be corrected first. Right now the stack traces are written to the file repeating the following sequence MARKER_STACKTRACE count depth stack stack ... stack In order to get a high compression ratio it would be better to combine multiple stacktraces and rearrange the data as follows MARKER_COMPRESSED_STACKTRACES counts_compressed_length counts_compressed depths_compressed_length depths_compressed stacks_compressed_length stacks_compressed In order to build the compress data you will want to 3 pairs of 2 buffers. A pair of buffers for counts, depths, and stacks. Your profiller would be writing to one set of buffers and another thread would be responsible for compressing buffers that are full and writing them to the file. Once a set of buffers are full the profiller would start filling up the other set of buffers. For each set of buffers you need a variable to hold the previous count, depth, and stack id. They will be initialized to 0 before any data is written to an empty buffer. In stead of writing the actual count value into the counts buffer you will write the difference between the current count and the previous count. The reason for doing this is that the delta values will mostly be around 0 which will significantly improve the compression ratio without adding much overhead. Of course you would do the same for depths and stack ids. When you compress the data you compress each buffer individually to make sure like data is being compressed. Like data compresses better the unlike data and by saving deltas very few bits will be required to represent the data and you are likely to have long strings of 0s and 1s. I'm sure now you can see why I don't want stack ids being random. As if they are random then the deltas will be all over the place so you wont end up with long strings of 0s and 1s and random data itself does not compress. To test this out I wouldn't bother modifying the c code but instead try it out in Python to first make sure the compression is providing huge gains and figure out how to tune the algorithm without having to mess with the signal handlers and writing the code for the separate thread and dealing issues such as making sure you don't start writing to a buffer before the thread finished writing the data to the file, etc. I would just read an existing profile file and rewrite it to a different file by rearranging the data and compressing the delta as I described. You can get away with one set of buffers as you wouldn't be profiling at the same time. To tune this process you will need to determine the appropriate number of stack traces that is small enough to keep memory down but large enough so that the overhead associated with compression small. Maybe start of with about 8000 stack traces. I would try gzip, bz2, and lzma and look at their compression ratios and times. Gzip is general faster than bz2 and lzma is the slowest. On the other hand lzma provides the best compression and gzip the worse. Since you will be compressing deltas you most likely can get away with using the fastest compression options under each compressor and not effect the compression ratio. But I would test it to verify this as it does depend on the data being compressed whether or not this is true. Also one option that is available in lzma is the ability to set the width of the data to look at when looking for patterns. Since you are saving 32 or 64 bit ints depending on the platform you can set the option to either 4 or 8 bytes based on the platform. I don't believe qzip or bz2 have this option. By setting this option in lzma you will likely improve the compression ratio. You may find that counts and depths give similar compression, between the 3 compression types in which case just use the fastest which will likely be gzip. On the other hand maybe the stack ids will be better off using lzma. This is also another reason to separate out, like data, as it gives you an option to use the fastest compressors for some data types while using others to provide for better compression. I would not be surprised if this approach achieves a compression ratio better than 100x but that will be heavily dependent on how local the stack ids are. Also don't forget about simple things like not using 64 bit ints when you can get away with smaller ones. Also for a slight variation to the above. If you find most of your deltas are < 127 you could write them out as 1 byte and when greater than 127 write them out as a 4 byte int with the high bit set. If you do this then don't set the lzma opti
[pypy-dev] Question about extension support
Hi Kevin, More up to date information can be found on the FAQ page http://doc.pypy.org/en/latest/faq.html#do-cpython-extension-modules-work-with-pypy The best approach for PyPy is either use a pure Python module if possible or use a cffi wrapped extension instead of an extension that uses the CPython CAPI. Often CPython CAPI extensions are wrapping some c library. Creating a cffi wrapper for the library is actually much simpler than writing a CPython CAPI wrapper. Quite a few CPython CAPI extensions have already been wrapped for cffi so make sure to search for one before creating your own wrapper. If you need to create a wrapper, refer to the cffi documentation at http://cffi.readthedocs.org/en/release-0.8/ Extensions wrapped with cffi are compatible with both CPython and PyPy. On CPython the performance is similar to what you would get if you used ctypes. How every, under PyPy, the performance is much closer to a native C call plus the overhead for releasing and acquiring the gil. John ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] Question about extension support
Hi Kevin, Here is another link about writing extensions for PyPy. http://doc.pypy.org/en/latest/extending.html John On Tue, Mar 25, 2014 at 9:48 PM, John Camara wrote: > Hi Kevin, > > More up to date information can be found on the FAQ page > > > http://doc.pypy.org/en/latest/faq.html#do-cpython-extension-modules-work-with-pypy > > The best approach for PyPy is either use a pure Python module if possible > or use a cffi wrapped extension instead of an extension that uses the > CPython CAPI. Often CPython CAPI extensions are wrapping some c library. > Creating a cffi wrapper for the library is actually much simpler than > writing a CPython CAPI wrapper. Quite a few CPython CAPI extensions have > already been wrapped for cffi so make sure to search for one before > creating your own wrapper. If you need to create a wrapper, refer to the > cffi documentation at > > http://cffi.readthedocs.org/en/release-0.8/ > > Extensions wrapped with cffi are compatible with both CPython and PyPy. > On CPython the performance is similar to what you would get if you used > ctypes. How every, under PyPy, the performance is much closer to a native > C call plus the overhead for releasing and acquiring the gil. > > John > ___ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] Parallella open hardware platform
Fijal, Whether someone works full time on a project is a separate issue. Being popular helps attract additional resources and PyPy is a project that could use additional resources. How many additional optimizations could PyPy add to get to a similar level of optimization to say the JVM. We are talking many many man years of work. How much additional work is it to develop and maintain backends for the various ARM, PPC, MIPS, etc processors How much work would it take to have PyPy support multi-cores? What if RPython needs to be significantly refactored or replaced. And we can go on and on. Typically every 10 years or so a new language becomes dominate but that hasn't happen lately. Java had been in the role for quite some time and for quite a few years it has be on the decline but yet no language has taken it's place in terms of dominance. The main reason why this hasn't happen so far is that no language has successfully dealt with the multi-core issue in a way that also keeps other desirable features we currently have with popular languages. But at some point, a language will prevail and become dominate and when that happens there will be a mass migration to this language. It doesn't mean that Python and other currently popular languages are just going to go away, it just their use will decline. If Python's popularity declines significantly it will in turn impact PyPy. Also many of the earlier adopters of PyPy are more likely to move on to the new dominate language. So where does that leave you. I expect you earn a living by doing PyPy consulting and thus you need PyPy to be popular. Now you don't have to believe that a new dominate language will occur but history says otherwise and many have been fooled into thinking otherwise is the past. I feel PyPy is Python's best chance at being able to survive this change in language dominance as it has the best chance of being able to do something about the multi-core situation. I'm glad the other day you mentioned about the web stack as if you didn't mention it I likely would not have thought about the PyPy hypervisor scenario. I'm starting to believe that approach, may have some decent merit to it and allow a way to kick the can down the road on the multi-core issues. I don't have the time to get into it right now but I start a new thread on the topic. Maybe within the next few days. John On Thu, Feb 7, 2013 at 4:33 AM, Maciej Fijalkowski wrote: > On Thu, Feb 7, 2013 at 6:41 AM, John Camara > wrote: > > Fijal, > > > > In the past you have complained about it being hard to make money in open > > source. One way to make it easier for you is grow the popularity of PyPy. > > So I would think you would at least have some interest in thinking of > ways > > to accomplish that. > > Before even reading further - how is being popular making money? Being > popular is being popular. Do you know any CPython developer working > full time on CPython? CPython is definitely popular by my standards > ___ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] Parallella open hardware platform
Fijal, In the past you have complained about it being hard to make money in open source. One way to make it easier for you is grow the popularity of PyPy. So I would think you would at least have some interest in thinking of ways to accomplish that. I'm not trying to dictate what PyPy should do but merely providing an opinion of mine that I see an opportunity that potential could be a great thing for PyPy. A year ago if someone asked me if PyPy should support embedded systems I would have given a firm no but I see the market changing in ways I didn't expect. The people hacking on these devices are fairly similar to open source developers and in some cases they even do open source development. They do things differently from the establishment which has provided a new way to think about manufacturing. Their ways are so different from the establishment and have become a game changer that it has ignited what is becoming a manufacturing revolution. Now because many who are involved in hacking with this hardware have no prior experience with the established ways of doing this type of business they are moving in directions that differ in how these devices get programmed. They are also in need of tools and new infrastructure and I feel that what PyPy has to offer can give them a starting point. Now at the end of the day I don't believe many of their requirements are going to be much different than the requirements for other markets and not likely too different than the direction PyPy will likely take. So why not go where all the big money is going to be at. Ok enough of that. Lets take a look at your example of a web stack. I believe right now PyPy is in a position to be used in this market. Sure PyPy could use some additional optimizations to improve the situation but I think in general it's already able to kick ass compared to CPython in terms of performance when a light web framework is used which is becoming increasing popular as web apps push the front ends to do most of the layout/presentation work. Also with with the web becoming more dynamic and the number of requests increasing at a substantial rate it becomes more important to reduce latencies which tends to give PyPy an advantage. This is all great while the web stacks are running on traditional servers but servers are changing. There are some servers being sold today that have hundreds of small cores and in the not too distant future there will be systems that have a number of full cores and a much larger number of smaller cores which may or may not have similar architectures. For instance servers with Phi coprocessors (8 GB of memory (60) 1 GHz cores, with I believe 4 threads each, with a PCIe3 interface) and have become recently available. How is PyPy going to handle this. Is this any different than the needs of the embedded systems. No. PyPy is going to have to start paying attention to how data is accessed and will have to make optimizations based on the access patterns. That is you have to make sure computational loads can offset the data transfer overhead. Today PyPy does not take into this overhead cost which is not required when running on one core.. For a web application it would be nice to run multiple sessions on a given core, save session related data locally to that core so as to minimize data transfer to the smaller cores which means directing all request for the session to the same core, doing any necessary encryption on these small cores, etc. But there may also be some work for a particular request which might not be appropriate to run on a small core and may have to run on the main core maybe due to it requiring access too much data. How is this going to work. Is PyPy going to do all the analysis itself or will the programmer provide some hints to PyPy as to how to break up the work. Who is going to be responsible for the scheduling and cleaning up the session data that is cached locally to the cores and a boat load of other issues I'm not sure it's a tough problem.and one that is just around the corner. Another option would be to run an HTTP load balance on the main cores, PyPy web stacks running on say dedicated Phi cores, with the HTTP requests forwarded over the PCIe bus. That way each Phi core acts like an independent web server. But running 60-240 PyPy processes in 8GB of memory is quite the challenge Maybe some sort of PyPy hypervisor that is able to run virtualized PyPy instances so that each instance can share all the JITed code but have it's own data. I'm sure many issues and questions exists like who would do the JITting the hypervisor or the virualized PyPy instances? Now even if you feel right now is not the time to start worrying about these new server architectures there are still other issues PyPy will start to run into, in the web stack market. Typically for a web application that is being accessed from the Internet there is a certain amount of latency that is acceptable. But what hap
Re: [pypy-dev] Parallella open hardware platform
Hi Armin, It's even worse I'm asking you to support and I don't even need it. When I posted this thread it was getting rather long and unfortunately I didn't really make all the points I wanted to make. At this point, and even for some time now PyPy has a great foundation but it's use remains low. Every now and then it's good to step back a little bit and reflect on the current situation and come up with a strategy that helps the project's popularity grow. I know that PyPy has done things to help with the growth such as writing blog posts, being quick to fix bugs, helping others with their performance issues and even rapidly adding optimizations to PyPy, presenting at conferences, and often actively being engaged in commenting any posts or comments made about PyPy. So PyPy is doing a lot of things right to help it's PR but yet there is this issue of slow growth. Now we know what the main issue is with it's growth is the fact that the Python ecosystem relies on a lot of libraries that use the CPython API and PyPy just doesn't have full support for this interface. I understand the reasons why PyPy is not going to support the full interface and PyPy has come up with the cffi library as a way to bridge the gap. And of course I don't expect the PyPy project to take on the responsibility of porting all the popular 3rd party libraries that use the CPython API to cffi. It's going to have to be a community effort. One thing that could help would be more marketing of cffi as very few Python developers know it exists. But that along is not going to be enough. History tells us that most successful products/projects that become popular do so by first supporting the needs of some niche market. As time goes by that niche market starts providing PR that helps other markets to discover the product/project and the cycle can sometimes continue until there is mass adoption. Now when PyPy started to place a focus on NumPy I had hoped that the market it serves would turn out to be the market that would help PyPy grow. But at this point in time it does not appear like that is going to happen. For a while I have been trying to think of a niche market that maybe helpful. But to do so you have to consider the current state of PyPy which means eliminating markets that heavily rely on libraries that use the CPython API, also going to avoid the NumPy market as that's currently being worked on, there is the mobile market but that's a tough one to get into, maybe the gaming market could be a good one, etc. It turns out with the current state of PyPy many markets need to be eliminated if you looking for one that is going to help with growth. The parrallella project on the other hand looks like it could be a promising one and I'll share so thoughts a little later in this post as to why I feel this way. Right now you have been putting a lot of effort into STM in which your trying to solve what is likely the biggest challenge that the developer community is facing. That is how to write software that effective leverages many cores in a way that is straight forward and in the spirit of Python. When you solve this problem and I have the faith that you will, most would think that it would cause PyPy's popularity to sky rocket. What most likely will happen is that PyPy gets a temporary boost in popularity as there is another lesson in history to be concerned about. Often the first to solve a problem does not become popular in the long run. As usually the first to solve the problem does so via a unique solution but once people start using it issues with the approach gets discovered. Then often many others will use the original solution solution as a starting point and modify it to eliminate these new issues. Then one of the second generation solutions ends up being the defacto standard. Now PyPy is able to move fairly quickly in terms of implementing new approaches so it may in fact be able to compete just fine against other 2nd generation solutions. But there may be some benefits to exposing STM for a smaller market to help PyPy buy some additional time before releasing it as a solution for the general developer community. So why the Parallella project. Well I think it can be helpful in a number of ways. First I don't believe that this market is going to need much from the libraries that use the CPython APIs. Many who are in this market are used to having to program for embedded systems and are more likely have the skills to help out the PyPy project in a number of areas and would likely also have a financial incentive to contribute back to PyPy such as helping keep various back ends up to date such as Arm, PPC, and additional architectures. Some in this market are used to using a number of graphical languages to program their devices but unfortunately for them some of the new products that need to enter the market can't be built fully with these graphical languages. Well with the PyPy framework it's possible for
Re: [pypy-dev] Should jitviewer come with a warning?
On Mon, Feb 4, 2013 at 3:42 AM, Maciej Fijalkowski wrote: > Seriously which ones? I think msgpack usage is absolutely legit. You > seem to have different opinions about the design of that software, but > you did not respond to my concerns even, not to mention the fact that > it sounds like it's not "obfuscated by jitviewer". > > Cheers, > fijal > First I would have tried using cffi to the msgpack c library. If I wasn't happy with it I would do a Python port. So for no lets forget about cffi and just deal with the current design of this library. I had tried to minimize the discussion about this library on this forum as I had already wrote extensive comments on the original blog [1]. Now I didn't do an extensive review of the code as I only concentrated on a small portion of it namely in the area of unpacking the msgpack messages. I'll just highlight a couple of concerns I had. The first thing the shocked me was the use of the struct.pack and struct.unpack functions. Normally when you need to pack and unpack often with the same format you would create a struct object with the desired format and use this object with its pack and unpack methods. That way the format string is not always being parsed but instead once when the struct object is created. As Bas pointed out pypy is able to optimize the parsing of the format which is great but why would you prefer to write code that would run with horrible performance under CPython when there is an alternative available. Now toward the end of the comments on the blog, Bas stated he tried the struct object under pypy and found it ran slower. So there is likely an opportunity for pypy to add another optimization as if pypy can optimize the struct functions it should be able to handle the struct objects which I would think would be an easier case to handle purely looking at it from a high level perspective. Another issue I had was the msgpack spec is designed in a way to minimize the need of copying data. That is you should be able to just use the data directly from the message buffers. The normal way to do this with the struct module is to use the unpack_from and pack_into methods instead of the pack and unpack methods. These methods take a buffer and an offset as opposed to the pack and unpack which would require you to slice out a copy of the original buffer to pass it in the unpack method. As Bas pointed out again pypy is able to optimize this copy created from slicing away which is great but again why code it in a way that will be slow on CPython when there is an alternative. The other issue I mentioned on the blog was the large number of if, elif statements used to handle each type of msgpack message. I instead suggested creating essentialy a list that holds references to struct objects so that the message type would be used as in index into this list. So that way you remove all the if, elif statements and end up with something like struct_objects[message_type].unpack_from() Now I understand that pypy is able to optimize all these if and elif statements by creating bridges for the various paths through this code but again why code it this way when it will be slow on CPython. I would also assume that using the if elif statements would still have more overhead in pypy compared to using a list of references although maybe there is not much of a difference. Any way this is just the issues I saw with this library which by the way is no where near as bad as other code I have seen written as a result of users using the jitviewer. Unfortunately, I could not discuss these other projects as they are closed source. Any way to get to the other part of you reply I assume not responding to your concerns is about the following "python is nicer. It does not segfault. Besides, how do you get a string out of a C library? if you do raw malloc it's prone to be bad. Etc. etc." Sorry that was an over sight. I feel the same way about Python but what's the real issue of taking the practical approach of using a c library that is written well and is robust. I would love to see everything written in Python but who has the time to port everything over. In the msgpack c library it would have the responsibility of maintaining the buffers. It's API supports creating and freeing these buffers. The msgpack library would be doing most of the work and the only data that has to go back and forth between the Python code and the library are just basic types like int, float, double, strings, etc. To get a string out of the c library just slice cffi.buffer to create a copy of it in Python before calling the function to clear the msgpack buffer. With using cffi this slicing to create copies of strings into Python and the overhead of calling into the c functions does add extra work over what would be done with the code written purely in Python and assuming pypy does have all the optimizations in place to get you to match the performance of the msgpack c library. The
Re: [pypy-dev] Should jitviewer come with a warning?
> that is definitely a no (my screen is too small to have some noise > there, if for no other reason), it might have a warning in the > documentation though, if it's any useful. But honestly, I doubt such a > warning makes any sense. People who are capable of using jitviewer > already "know better". I agree it should not be part of the normal output. I would say add it to the doc string in app.py and to the README file. As far as people using the jitviewer already "know better". If that's the case I wouldn't have started this thread. Like you said earlier the use of jitviewer is only promoted on irc and yet I have come across 3 people working on different projects who are using it for the wrong reasons over the last 2 weeks. It's like this is the new RPython where people start using it for the wrong reasons. ___ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] Should jitviewer come with a warning?
> Also, looking at the msgpack - this code is maybe not ideal, but if > you're dealing with buffer-level protocols, you end up with code > looking like C a lot. I do agree that this type a code will likely end up looking like C but it's not necessary for all of it to look like c. Like there should be a need to have long chains of if, elif statements. Using pack_into and unpack_from instead of pack and unpack methods so that it directly deals with the buffer instead of making sub strings. Even if pypy can optimize this away why write Python code like this when its not necessary. Plus I felt, initially the code should just use cffi and connect to the native c library. I believe this approach is likely to give very close to the best performance you could get on pypy for this type of library. I'm not sure how much of an increase in performance would be gain by writing the library completely in Python vs using cffi. Is there anything wrong with this line of thinking. Do you feel a pure Python approach could achieve better results than using cffi under pypy. John ___ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] Should jitviewer come with a warning?
> Let me rephrase it. Where did you look for such a warning and you did > not find it so you assumed it's ok? > Cheers, > fijal Having a warning on https://bitbucket.org/pypy/jitviewer would be good. On Sun, Feb 3, 2013 at 3:08 PM, John Camara wrote: > > What makes you think people will even read this warning, let alone > > prioritize it over their immediate desire to make their program run > > faster? > > > (Not that I am objecting to adding the warning, but I think you might be > > fooling yourself if you think it will have any impact) > > > Jean-Paul > > I agree with you and was not being naive and thinking this alone was going to > solve the problem but it does gives us something to point to when we see > someone abusing the jitviewer. > > Maybe, a more effective approach, is not to advertise about the jitviewer to > everyone who has performance issues and only tell those who are experience > programmers who have already done the obvious in fixing any design issues > that had existed in their code. Having inexperience developers use the > normal profiling tools will still help them find the hot spots in their code > and help prevent them from picking up habits that lead them to writing > un-Pythonic code. > > I'm sure we all agree that code with a better design will run faster in pypy > than trying to add optimizations that work only for pypy to help out a poor > design. > > I don't think we want to end up with a lot of Python code that looks like C > code. This is what happens when the inexperience start relying on the > jitviewer. > > For instance take a look at this code [1] and blog [2] which lead me to post > this. This is not the first example I have come across this issue and > unfortunately it appears to be increaseing at an alarming rate. > > I guess I feel we have a responsibility to try to promote good programming > practices when we can. > > [1] - > https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py > > [2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/ > > John > > > > On Sun, Feb 3, 2013 at 12:39 PM, John Camara wrote: > >> I have been noticing a pattern where many who are writing Python code to >> run on PyPy are relying more and more on using the jitviewer to help them >> write faster code. Unfortunately, many of them who do so don't look at >> improving the design of their code as a way to improve the speed at which >> it will run under PyPy but instead start writing obscure Python code that >> happens to run faster under PyPy. >> >> I know that at least with the PyPy core developers they would like to see >> every one just create good clean Python code and that often code that has >> been made into obscure Python was don so to try to optimize it for CPython >> which in many cases causes it to run slower on PyPy than it would run it >> the code just followed typical Python idioms. >> >> I feel that a normal developer should be using tools like cProfiler and >> runsnakerun and cleaning up design issues way before they should even >> consider using jitviewer. >> >> In a recent case where I saw someone using the jitviewer who likely >> doesn't need to use it. At least they don't need to use it considering the >> current design of the code I said the following >> >> "The jitviewer should be mainly used by PyPy core developers and those >> building PyPy VMs. A normal developer writing Python code to run on PyPy >> shouldn’t have a need to use it. They can use it to point out an >> inefficiency that PyPy has to the core developers but it should not be used >> as a way to get you to write Python code in a way that has a better chance >> of being optimized under PyPy except for very rare occasions and even then >> it should only be made by those who follow closely and understand PyPy’s >> development." >> >> >> Do others here share this same opinion and should some warning be added >> to the jitviewer? >> >> John >> > > ___ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Re: [pypy-dev] Should jitviewer come with a warning?
> What makes you think people will even read this warning, let alone > prioritize it over their immediate desire to make their program run > faster? > (Not that I am objecting to adding the warning, but I think you might be > fooling yourself if you think it will have any impact) > Jean-Paul I agree with you and was not being naive and thinking this alone was going to solve the problem but it does gives us something to point to when we see someone abusing the jitviewer. Maybe, a more effective approach, is not to advertise about the jitviewer to everyone who has performance issues and only tell those who are experience programmers who have already done the obvious in fixing any design issues that had existed in their code. Having inexperience developers use the normal profiling tools will still help them find the hot spots in their code and help prevent them from picking up habits that lead them to writing un-Pythonic code. I'm sure we all agree that code with a better design will run faster in pypy than trying to add optimizations that work only for pypy to help out a poor design. I don't think we want to end up with a lot of Python code that looks like C code. This is what happens when the inexperience start relying on the jitviewer. For instance take a look at this code [1] and blog [2] which lead me to post this. This is not the first example I have come across this issue and unfortunately it appears to be increaseing at an alarming rate. I guess I feel we have a responsibility to try to promote good programming practices when we can. [1] - https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py [2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/ John On Sun, Feb 3, 2013 at 12:39 PM, John Camara wrote: > I have been noticing a pattern where many who are writing Python code to > run on PyPy are relying more and more on using the jitviewer to help them > write faster code. Unfortunately, many of them who do so don't look at > improving the design of their code as a way to improve the speed at which > it will run under PyPy but instead start writing obscure Python code that > happens to run faster under PyPy. > > I know that at least with the PyPy core developers they would like to see > every one just create good clean Python code and that often code that has > been made into obscure Python was don so to try to optimize it for CPython > which in many cases causes it to run slower on PyPy than it would run it > the code just followed typical Python idioms. > > I feel that a normal developer should be using tools like cProfiler and > runsnakerun and cleaning up design issues way before they should even > consider using jitviewer. > > In a recent case where I saw someone using the jitviewer who likely > doesn't need to use it. At least they don't need to use it considering the > current design of the code I said the following > > "The jitviewer should be mainly used by PyPy core developers and those > building PyPy VMs. A normal developer writing Python code to run on PyPy > shouldn’t have a need to use it. They can use it to point out an > inefficiency that PyPy has to the core developers but it should not be used > as a way to get you to write Python code in a way that has a better chance > of being optimized under PyPy except for very rare occasions and even then > it should only be made by those who follow closely and understand PyPy’s > development." > > > Do others here share this same opinion and should some warning be added to > the jitviewer? > > John > ___ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
[pypy-dev] Should jitviewer come with a warning?
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy. I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms. I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer. In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following "The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development." Do others here share this same opinion and should some warning be added to the jitviewer? John ___ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
[pypy-dev] Parallella open hardware platform
A couple of days ago I heard about the Parallella [1] project which is an open hardware platform similar to the Raspberry Pi but with much higher capabilities. It has a Zynq Z-7010 which has both a dual core ARM A9 (800 MHz) processor and a Artix-7 FPGA, a 16 core Epiphany multicore accelerator, 1GB ram (see [2] for more info) and currently boots up in Ubuntu. The goal of the Parallella project is to develop an open parallel hardware platform and development tools. Recently they announced support for Python with Mark Dewing [3] leading the effort. I had asked Mark if he considered PyPy but at this time he doesn't have time for this investigation and he reposted my comment on the forum [4] with a couple of question. Maybe one of you could answer them. Working with the Parallella project maybe a good opportunity for the PyPy project from both a PR perspective and as well as the technical challenges it would present. On the technical side it would give the opportunity to test STM on a reasonable number of cores while also dealing with cores from different architectures (ARM and Epiphany). I could see all the JITting occurring on the ARM cores with it producing output for both architectures based on which type of core STM decides to use for a chunk of work to execute on. Of course there is also the challenge of bridging between the 2 architectures. Maybe even some of the more expensive STM operations could be offloaded to the FPGA or even a limited amount of very hot sections of code could be JITted to the FPGA (although this might be more work than its worth). >From a PR perspective PyPy needs to excel at some niche market so that the PyPy platform can take off. When PyPy started concentrating on the scientific market with increasing support for Numpy I thought this would be the niche market that would get PyPy to take off. But there have been a couple of issue with this approach. There is a tremendous amount of work that needs to be done so that PyPy can look attractive to this niche market. It requires supporting both NumPy and SciPy and their was an expectation that if PyPy supports NumPy others would come to help out with the SciPy support. The problem is that there doesn't seam to be many who are eager to pitch in for the SciPy effort and there also has not been a whole lot willing to help will the ongoing NumPy work. I think in general the ratio of people who use NumPy and SciPy to those willing to contribute is quite small. So the idea of going after this market was a good idea and can definitely have the opportunity to showing the strength of PyPy project it hasn't done much to improve the image of the PyPy project. It also doesn't help that there is some commercial interests that have popped up recently that have decided to play hard ball against PyPy by spreading FUD. Unlike the Raspberry Pi hardware which can only support hobbyist the Parallella hardware can support both hobbyists and commercial interests. They cost $100 which is more than the $35 for Raspberry Pi but still within reach of most hobbyists and they didn't cut out the many features that are needed for commercial interests. The Parallella project raised nearly $0.9 million on kickstarter [5] for the project with nearly 5000 backers. Since many who will use the Parallella hardware also have experience on embedded systems they and are more likely used to writing low level code in assembly, FPGAs, and even lots of C code and I'm sure have hit many issues with programming in parallel/multithreaded and would welcome a better developer experience. I bet many of them would be willing to contribute both financially and time to supporting such an effort. I believe the Architecture of PyPy could lend it self to becoming the core of such a development system and would allow Python to be used in this space. This could provide a lot of good PR for the PyPy project. Now I'm not saying PyPy shouldn't devote any more time to supporting NumPy as I'm sure when PyPy has very good support for both NumPy and SciPy it's going to be a very good day for all Python supporters. I just think that the PyPy team needs to think about a strategy that in the end will help its PR and gain support from a much larger community. This project is doing a lot of good things technically and now it just needs to get the attention of the development community at large. Now I can't predict if working with the Parallella project would be the break though in PR that PyPy needs but it's at least an option that's out there. BTW I don't have any commercial interests in the Parallella project. If some time in the future I use their hardware it would likely be as a hobbyist and it would be nice to program it in Python. My real objective of this post to see the PyPy project gain wider interest as it would be a good thing for Python. [1] - http://www.parallella.org/ [2] - http://www.parallella.org/board/ [3] - http://forums.parallella.org/memberlist.php?m