[protobuf] Customizing protoc to support an existing wire format
Hello, I'd like to see if any prior work has been done in customizing protobuf compilation to support message encoding/decoding against a legacy wire format. Put another way, I'm interested in: 1. specifying an existing protocol using protobuf's .proto file syntax, and 2. reusing protobuf's .proto file parsing and code generation infrastructure, while 3. replacing protobuf's default encoding algorithm and replacing it with one that conforms to an existing format. This discussion from 2013 is the closest thing I've found to a similar question on this mailing list. Unfortunately it doesn't go into much detail: https://groups.google.com/forum/#!topic/protobuf/zvughVLk6BU Some context will probably be of use. The existing wire format in question is that of Bitcoin's peer-to-peer network protocol. These messages and their binary representations are defined in this document: https://en.bitcoin.it/wiki/Protocol_specification#Message_types Note that protocol buffers were considered for use during Bitcoin's initial development, but rejected on concerns around complexity and security: https://bitcointalk.org/index.php?topic=632.msg7090#msg7090 Whether or not those concerns were well-founded, Bitcoin's resulting wire format works well today, and for this reason, changing it is not considered to be an option. The impetus for this question, then, is that there are an increasing number of implementations of the Bitcoin protocol under development today, and in order to participate in the peer-to-peer network, each must faithfully re-implement handling this custom wire format. Typically this work is done through a combination of studying the documentation above and carefully transcribing code from the Bitcoin Core reference implementation. This creates a significant barrier to entry as well as a potential source of bugs that can threaten network stability. To avoid this tedious and error-prone work, there is a desire to codify the message formats in such a way that language-specific bindings may be generated rather than hand-coded. The encoding algorithm and code generation for each specific language would of course have to be custom developed, but the idea is to do so within an otherwise widely-used framework such as protocol buffers, minimizing the need to re-invent as much as possible. I have not yet looked deeply at the extension points within protocol buffers to assess the feasibility of this idea. I have seen that protoc supports plugins [1], but don't know whether anyone has gone so far with them as to replace fundamental assumptions about wire format. I have also noticed Custom Options [2], which may help in expressing particular quirks or nuances of the existing protocol within .proto files. At this point, I'd simply like to see whether anyone has been down this road before, and whether there are reasons for dismissing the idea completely before digging in too much further. - Chris P.S: Please note that in posting this question I am in no way presuming to represent the Bitcoin Core development team. [1]: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.compiler.plugin.pb [2]: https://developers.google.com/protocol-buffers/docs/proto#options -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To post to this group, send email to protobuf@googlegroups.com. Visit this group at http://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.
Re: [protobuf] Customizing protoc to support an existing wire format
There are two ways to support custom wire format with protobufs. 1. Implement the parsing/serializing code as a runtime library. The text format support in protobuf can be seen as such a library. Support for Json/XML is also done using this approach. It relies on the protobuf reflection support which allows you to query the type information of a protobuf message and manipulate arbitrary protobuf messages (like the Reflection support in certain languages like Java but for protobufs). 2. Override protobuf code generation behavior to inject code into the generated classes or generate completely new custom classes. To do this you have two choices, one is to build a custom protoc binary and the other is to use plugins. Both will require an implementation of the CodeGenerator interface. Many people who implement protobufs (for languages other than the officially supported C++, Java, Python) take the first approach probably because it's easy to do and the use of plugins is not well documented. The use of plugin is the recommended approach though. If you are only implementing this wire format for C++ and don't care that much about performance, 1) is probably easier to try out. Otherwise you'll need to use the 2) code gen approach. For examples you can have a look at the third-party add-ons listhttps://code.google.com/p/protobuf/wiki/ThirdPartyAddOns. The programming languages covers how to generate new classes and the RPC section has examples covering how to insert code into existing generated code. They are not exactly the same as supporting a new wire format but the implementation techniques should be no different. In your case, you can either generate new custom classes to encode/decode from the custom wire format or generate additional parsing/serializing methods in existing classes. Note that the support for the latter is limited. It's best supported in C++ and not very well supported in Java/Python. For other languages it might not be supported at all. On Mon, May 26, 2014 at 3:56 AM, Chris Beams ch...@beams.io wrote: Hello, I'd like to see if any prior work has been done in customizing protobuf compilation to support message encoding/decoding against a legacy wire format. Put another way, I'm interested in: 1. specifying an existing protocol using protobuf's .proto file syntax, and 2. reusing protobuf's .proto file parsing and code generation infrastructure, while 3. replacing protobuf's default encoding algorithm and replacing it with one that conforms to an existing format. Plugins allow you to insert code into existing generated code, but you won't be able to replace existing code. As I mentioned above this is only well supported in C++. If this is not a concern to you I would be happy to give you more details on how to implement such a plugin. This discussion from 2013 is the closest thing I've found to a similar question on this mailing list. Unfortunately it doesn't go into much detail: https://groups.google.com/forum/#!topic/protobuf/zvughVLk6BU Some context will probably be of use. The existing wire format in question is that of Bitcoin's peer-to-peer network protocol. These messages and their binary representations are defined in this document: https://en.bitcoin.it/wiki/Protocol_specification#Message_types Note that protocol buffers were considered for use during Bitcoin's initial development, but rejected on concerns around complexity and security: https://bitcointalk.org/index.php?topic=632.msg7090#msg7090 Whether or not those concerns were well-founded, Bitcoin's resulting wire format works well today, and for this reason, changing it is not considered to be an option. The impetus for this question, then, is that there are an increasing number of implementations of the Bitcoin protocol under development today, and in order to participate in the peer-to-peer network, each must faithfully re-implement handling this custom wire format. Typically this work is done through a combination of studying the documentation above and carefully transcribing code from the Bitcoin Core reference implementation. This creates a significant barrier to entry as well as a potential source of bugs that can threaten network stability. To avoid this tedious and error-prone work, there is a desire to codify the message formats in such a way that language-specific bindings may be generated rather than hand-coded. The encoding algorithm and code generation for each specific language would of course have to be custom developed, but the idea is to do so within an otherwise widely-used framework such as protocol buffers, minimizing the need to re-invent as much as possible. I have not yet looked deeply at the extension points within protocol buffers to assess the feasibility of this idea. I have seen that protoc supports plugins [1], but don't know whether anyone has gone so far with them as to replace fundamental assumptions about wire
Re: [protobuf] Customizing protoc to support an existing wire format
On Mon, May 26, 2014 at 6:56 PM, Chris Beams ch...@beams.io wrote: Hello, I'd like to see if any prior work has been done in customizing protobuf compilation to support message encoding/decoding against a legacy wire format. Put another way, I'm interested in: 1. specifying an existing protocol using protobuf's .proto file syntax, and 2. reusing protobuf's .proto file parsing and code generation infrastructure, while 3. replacing protobuf's default encoding algorithm and replacing it with one that conforms to an existing format. This discussion from 2013 is the closest thing I've found to a similar question on this mailing list. Unfortunately it doesn't go into much detail: https://groups.google.com/forum/#!topic/protobuf/zvughVLk6BU Some context will probably be of use. The existing wire format in question is that of Bitcoin's peer-to-peer network protocol. These messages and their binary representations are defined in this document: https://en.bitcoin.it/wiki/Protocol_specification#Message_types Note that protocol buffers were considered for use during Bitcoin's initial development, but rejected on concerns around complexity and security: https://bitcointalk.org/index.php?topic=632.msg7090#msg7090 Whether or not those concerns were well-founded, Bitcoin's resulting wire format works well today, and for this reason, changing it is not considered to be an option. The impetus for this question, then, is that there are an increasing number of implementations of the Bitcoin protocol under development today, and in order to participate in the peer-to-peer network, each must faithfully re-implement handling this custom wire format. Typically this work is done through a combination of studying the documentation above and carefully transcribing code from the Bitcoin Core reference implementation. This creates a significant barrier to entry as well as a potential source of bugs that can threaten network stability. To avoid this tedious and error-prone work, there is a desire to codify the message formats in such a way that language-specific bindings may be generated rather than hand-coded. The encoding algorithm and code generation for each specific language would of course have to be custom developed, but the idea is to do so within an otherwise widely-used framework such as protocol buffers, minimizing the need to re-invent as much as possible. I have not yet looked deeply at the extension points within protocol buffers to assess the feasibility of this idea. I have seen that protoc supports plugins [1], but don't know whether anyone has gone so far with them as to replace fundamental assumptions about wire format. I have also noticed Custom Options [2], which may help in expressing particular quirks or nuances of the existing protocol within .proto files. At this point, I'd simply like to see whether anyone has been down this road before, and whether there are reasons for dismissing the idea completely before digging in too much further. Check out https://code.google.com/p/protostuff/ It uses proto files for compilation/code generation but does not really implement the full proto spec. Custom options and annotations have been supported from the start (along with external compiler options) to aid code generation for specific languages/formats. - Chris P.S: Please note that in posting this question I am in no way presuming to represent the Bitcoin Core development team. [1]: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.compiler.plugin.pb [2]: https://developers.google.com/protocol-buffers/docs/proto#options -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To post to this group, send email to protobuf@googlegroups.com. Visit this group at http://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout. -- When the cat is away, the mouse is alone. - David Yu -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To post to this group, send email to protobuf@googlegroups.com. Visit this group at http://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.