On Mon, Feb 4, 2013 at 6:28 PM, John Camara <john.m.cam...@gmail.com> wrote: > On Mon, Feb 4, 2013 at 3:42 AM, Maciej Fijalkowski <fij...@gmail.com> wrote: > >> >> Seriously which ones? I think msgpack usage is absolutely legit. You >> seem to have different opinions about the design of that software, but >> you did not respond to my concerns even, not to mention the fact that >> it sounds like it's not "obfuscated by jitviewer". >> >> Cheers, >> fijal > > > First I would have tried using cffi to the msgpack c library. If I wasn't > happy with it I would do a Python port. So for no lets forget about cffi > and just deal with the current design of this library. > > I had tried to minimize the discussion about this library on this forum as I > had already wrote extensive comments on the original blog [1]. Now I didn't > do an extensive review of the code as I only concentrated on a small portion > of it namely in the area of unpacking the msgpack messages. I'll just > highlight a couple of concerns I had. > > The first thing the shocked me was the use of the struct.pack and > struct.unpack functions. Normally when you need to pack and unpack often > with the same format you would create a struct object with the desired > format and use this object with its pack and unpack methods. That way the > format string is not always being parsed but instead once when the struct > object is created. > > As Bas pointed out pypy is able to optimize the parsing of the format which > is great but why would you prefer to write code that would run with horrible > performance under CPython when there is an alternative available. Now > toward the end of the comments on the blog, Bas stated he tried the struct > object under pypy and found it ran slower. So there is likely an > opportunity for pypy to add another optimization as if pypy can optimize the > struct functions it should be able to handle the struct objects which I > would think would be an easier case to handle purely looking at it from a > high level perspective.
It's a fallback for PyPy, so CPython speed is irrelevant. Also CPython has tons of weird quirks and "faster for PyPy and slower for CPython" is not always a bad thing. Personally I don't care. This particular example however should be reported as a bug in PyPy - using Struct is *nicer*, so it should be as fast (and there is no good reason why not). > > Another issue I had was the msgpack spec is designed in a way to minimize > the need of copying data. That is you should be able to just use the data > directly from the message buffers. The normal way to do this with the > struct module is to use the unpack_from and pack_into methods instead of the > pack and unpack methods. These methods take a buffer and an offset as > opposed to the pack and unpack which would require you to slice out a copy > of the original buffer to pass it in the unpack method. As Bas pointed out > again pypy is able to optimize this copy created from slicing away which is > great but again why code it in a way that will be slow on CPython when there > is an alternative. Python buffer support sucks. For example you don't get a string out (because strings are immutable). PyPy buffer support double sucks, because buffer protocol is broken and we also didn't care. Fortunately we're able to optimize string slicing here (strings are nicer than buffers or bytearrays to play with), but we should fix buffers. Sorry about that. Again, the CPython speed does not apply. > > The other issue I mentioned on the blog was the large number of if, elif > statements used to handle each type of msgpack message. I instead suggested > creating essentialy a list that holds references to struct objects so that > the message type would be used as in index into this list. So that way you > remove all the if, elif statements and end up with something like > > struct_objects[message_type].unpack_from() Lack of constant propagation. Again, a potential bug in PyPy, but a hard one. > > Now I understand that pypy is able to optimize all these if and elif > statements by creating bridges for the various paths through this code but > again why code it this way when it will be slow on CPython. I would also > assume that using the if elif statements would still have more overhead in > pypy compared to using a list of references although maybe there is not much > of a difference. It's not about if/elif or references (all those things are incredibly cheap), but about constant propagation. Notably determining that a format is constant. This would disappear if we fix Struct (it's an easy fix, a few hours of work for someone not experienced with PyPy) > > Any way this is just the issues I saw with this library which by the way is > no where near as bad as other code I have seen written as a result of users > using the jitviewer. Unfortunately, I could not discuss these other > projects as they are closed source. And we're unable to help you because of that. > > Any way to get to the other part of you reply I assume not responding to > your concerns is about the following > > "python is nicer. It does not segfault. Besides, how do you get a > string out of a C library? if you do raw malloc it's prone to be bad. > Etc. etc." > > Sorry that was an over sight. I feel the same way about Python but what's > the real issue of taking the practical approach of using a c library that is > written well and is robust. I would love to see everything written in > Python but who has the time to port everything over. If you're dealing with a data coming from the outside using Python over C lib sounds like a very sensible idea security-wise. I can't blame anyone here. I would do the same (given that the protocol is simple enough as well). Cheers, fijal _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev