I know what C in CFFI stands for C way of doing things, so I hope people won't try to defend that position and instead try to think about what if we have to re-engineer ABI access from scratch, for explicit and obvious debug binary interface.
CFFI is not useful for Python programmers and here is why. The primary reason is that it requires you to know C. And knowing C requires you to know about OS architecture. And knowing about OS architecture requires you to know about ABI, which is: http://stackoverflow.com/a/3784697 This is how the compiler builds an application. It defines things (but is not limited too): How parameters are passed to functions (registers/stack). Who cleans parameters from the stack (caller/callee). Where the return value is placed for return. How exceptions propagate. The problematic part of it is that you need to think of OS ABI in terms of unusual C abstractions. Coming through several levels of them. Suppose you know OS ABI and you know that you need direct physical memory access to set bytes for a certain call in this way: 0024: 00 00 00 6C 33 33 74 00 How would you do this in Python? The most obvious way is with byte string - \x00\x00\x00\x6c\x33\x33\x74\x00 but that's not how you prepare the data for the call if, for example, 00 6C means anything to you. What is the Python way to convert 00 6C to convenient Python data structure and back and is it Pythonic (user friendly and intuitive)? import struct struct.unpack('wtf?', '\x00\x6C') If you try to lookup the magic string in struct docs: http://docs.python.org/2/library/struct.html#format-characters You'll notice that there is the mapping between possible combinations of these 2 bytes to some Python type is very mystic. First it requires you to choose either "short" or "unsigned short", but that's not enough for parsing binary data - you need to figure out the proper "endianness" and make up a magic string for it. This is just for two bytes. Imagine a definition for a binary protocol with variable message size and nested data structures. You won't be able to understand it by reading Python code. More than that - Python *by default* uses platform specific "endianness", it is uncertain (implicit) about it, so not only you should care about "endianness", but also be an expert to find out which is the correct metrics for you. Look at this: 0024: 00 00 00 6C 33 33 74 00 Where is "endianness", "alignment", "size" from this doc http://docs.python.org/2/library/struct.html#byte-order-size-and-alignment People need to *start* with this base and this concept and that's why it is harmful. CFFI proposes to provide a better interface to skip this complexity by getting back to roots and use C level. That's a pretty nice hack for C guys, I am sure it makes them completely happy, but for academic side of PyPy project, for Python interpreter and other projects build over RPython it is important to have a tool that allows to experiment with binary interfaces in convenient, readable and direct way, makes it easier for humans to understand (by reading Python code) how Python instructions are translated by JIT into binary pieces in computer memory, pieces that will be processed by operating system as a system function call on ABI level. But let's not digress, and get back to the point that struct module doesn't allow to work with structured data. In Python the only alternative standard way to define binary structure is ctypes. ctypes documentation is no better for binary guy: http://docs.python.org/2/library/ctypes.html#fundamental-data-types See how that binary guy suffered to map binary data to Python structures through ctypes: https://bitbucket.org/techtonik/discovery/src/eacd864e6542f14039c9b31eecf94302f3ef99ec/graphics/gfxtablet/gfxtablet.py?at=default And I am saying that this is the best way available from standard library. It is pretty close to Django models, but for binary data. ctypes still is worse that struct in one thing - looking into docs, there are no size specifiers for any kind of C type, so no guarantee that 2 bytes are read as 4 bytes or worse. By looking at the ctypes code it is hard to figure out size of structure and when it may change. I can't hardly name ctypes mapping process as user friendly and resulting code as intuitive. Probably nobody could, and that's why CFFI was born. But CFFI took a different route - instead of trying to map C types to binary data (ABI level), it decided to go onto API level. While it exposes many better tool, it basically means you are dealing with C interface again - not with Pythonic interface for binary data. I am not saying that CFFI is bad - I am saying that it is good, but not enough, and that it can be fixed with cleanroom engineering approach for a broader scope of modern usage pattern for binary data than just calling OS API in C way. Why we need it? I frankly think that Stackless way of doing thing without C stack is the future, and the problem with not that not many people can see how it works, builds alternative system without classic C stack with (R)Python. Can CFFI help this? I doubt that. So, that am I proposing. Just an idea. Given the fact that I am mentally incapable of filling 100 sheet requirement to get funding under H2020, the fact that no existing commercial body could be interested to support the development as an open source project and the fact that hacking on it alone might become boring, giving this idea is the least I can do. Cleanroom engineering. http://en.wikipedia.org/wiki/Cleanroom_software_engineering "The focus of the Cleanroom process is on defect prevention, rather than defect removal." When we talk about Pythonic way of doing thing, how can we define "a defect"? Basically, we talking about user experience - the emotions that user experiences when he uses Python for the given task. What is the task at hand? For me - it is working with binary data in Python - not just parsing save games, but creating binary commands such as OS systems calls that are executed by certain CPU, GPU or whatever is on the receiver end of whatever communication interface is used. This is hardware independent and platform neutral way of doing things. So, the UX is the key, but properties of engineered product are not limited single task. The cleanroom approach allows to concentrate on the defect - when user experience will start to suffer because of the conflicts between tasks that users are trying to accomplish. For PyPy project I see the value in library for compositing of binary structures in that these operations can be pipelined and optimized at run-time in a highly effective fashion. I think that convenient binary tool is the missing brick in the basement of academic PyPy infrastructure to enable universal interoperability from (R)Python with other digital systems by providing a direct interface to the binary world. I think that 1973 year views on "high level" and "low level" systems are a little bit outdated now that we have Python, Ruby, Erlang and etc. Now C is just not a very good intermediary for "low level" access. But frankly, I do not think that with advent of networking, binary can be called a low level anymore. It is just another data format that can be as readable for humans as program structure written in Python. P.S. I have some design ideas how to make an attractive gameplay out of binary data by "coloring" regions and adding "multi-level context" to hex dumps. This falls out of scope of this issue, and requires more drawing that texting, but if somebody wants to help me with sharing the vision - I would not object. It will help to make binary world more accessible, especially for new people, who start coding with JavaScript and Python. -- anatoly t. _______________________________________________ pypy-dev mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-dev
