[jira] [Commented] (DRILL-5957) Wire protocol versioning, version negotiation
[ https://issues.apache.org/jira/browse/DRILL-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249084#comment-16249084 ] Paul Rogers commented on DRILL-5957: [~tdunning], good points as usual! The end user is really the driver of version compatibility support. Drill users have been fairly silent on this front, so perhaps version compatibility is not necessary in this particular case. But, based on experience with other products, folks in production often need to: * Upgrade clients and servers on slightly different schedules, forcing the need for older clients to talk with newer servers or visa-versa. * If clients are installed on desktops (such as ODBC for Tableau), then the clients may actually be remote (or on a plane or at a customer site) at the time of server upgrades, forcing the need for older clients to connect to new servers. * If users have multiple Drill clusters, then they may have different upgrade schedules, requiring that a single client be able to speak with multiple server versions. (In the most obvious case, a user may have version X in production, while trying out version (X+1) in a test environment prior to upgrade.) Perhaps there are alternatives in the big data world. Maybe two versions of the Drill client can coexist in the same Tableau or other app? (For JDBC, this would mean that the clients must be in separate name spaces, much as SQuirreL does, but SQLline does not do.) Will there be a performance hit to "transcode" vectors across versions when the version changes? Of course. The question is, is the temporary performance hit an acceptable cost to allow a staged upgrade? Or, would the users prefer to do an all-at-once upgrade in order to avoid the performance hit? (And, of course, the performance hit creates a very good incentive to upgrade...) Finally, note that the "dbody" portion of of a Drill message exists outside of the Protobuf structure. A Drill message has four parts: * Message ID * P-body (Protobuf body) length * P-body (Serialized Protobuf content) * D-body (data body) length * D-body (serialized value vectors) For this reason, Protobuf formats don't help us with vector serialization. Note that vector data may be GB in size, so sending two copies is a worse performance impact than transcoding... All this said, if staged upgrades and version compatibility is not a concern for Drill users at present, then there is no barrier to upgrading our vector formats; we just require new clients be used with the new Drill version. This would, of course, be the simplest solution by far. > Wire protocol versioning, version negotiation > - > > Key: DRILL-5957 > URL: https://issues.apache.org/jira/browse/DRILL-5957 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers > > Drill has very limited support for evolving its wire protocol. As Drill > becomes more widely deployed, this limitation will constrain the project's > ability to rapidly evolve the wire protocol based on user experience to > improve simplicitly, performance or minimize resource use. > Proposed is a standard mechanism to version the API and negotiate the API > version between client and server at connect time. The focus here is between > Drill clients (JDBC, ODBC) and the Drill server. The same mechanism can also > be used between servers to support rolling upgrades. > This proposal is an outline; it is not a detailed design. The purpose here is > to drive understanding of the problem. Once we have that, we can focus on the > implementation details. > h4. Problem Statement > The problem we wish to address here concerns both the _syntax_ and > _semantics_ of API messages. Syntax concerns: > * The set of messages and their sequence > * The format of bytes on the wire > * The format of message packets > Semantics concerns: > * The meaning of each field. > * The layout of non-message data (vectors, in Drill.) > We wish to introduce a system whereby both syntax and semantics can be > evolved in a controlled, known manner such that: > * A client of version x can connect to, and interoperate with, a server in a > range of versions (x-y, x+z) for some values of y and z. > For example, version x of the Drill client is deployed in the field. It must > connect to the oldest Drill cluster available to that client. (That is it > must connect to servers up to y versions old.) During an upgrade, the server > may be upgraded before the client. Thus, the client must also work with > servers up to z versions newer than the client. > If we wish to tackle rolling upgrades, then y and z can both be 1 for > server-to-server APIs. A version x server will talk with (x-1) servers when > the cluster upgrades to x, and will talk to (x+1) servers when th
[jira] [Commented] (DRILL-5957) Wire protocol versioning, version negotiation
[ https://issues.apache.org/jira/browse/DRILL-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249063#comment-16249063 ] Ted Dunning commented on DRILL-5957: This suggestion has the virtue that only breaking changes will cause a version update, but it still has the problem that the version has to move no matter what part of the protocol changes. This is reminiscent of the old CORBA versioning nightmares. Also, is there really any way to negotiate the value vector format without having a reformatting step inserted with fairly catastrophic performance hit? I don't see a consideration of the cost of maintaining old version compatibility, either. If old client versions work, then there will be no incentive to upgrade. That will increase pressure to keep adding multiple protocol support to the server and will seemingly lock down any real progress just as much as client/server lockstepping. It seems that the short term desire here is to allow the vector format to change. What about making the current dvector parts be optional and adding alternative (optional) dvector parts in new formats? This effectively allows versioning of only the dvector stuff, leaving all the rest of the protocol to be soft-versioned as is currently done. The client advertised version could be used to trigger one format or the other and the incentive to upgrade is in the form of much slower transfer for the old format due to transcoding. > Wire protocol versioning, version negotiation > - > > Key: DRILL-5957 > URL: https://issues.apache.org/jira/browse/DRILL-5957 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers > > Drill has very limited support for evolving its wire protocol. As Drill > becomes more widely deployed, this limitation will constrain the project's > ability to rapidly evolve the wire protocol based on user experience to > improve simplicitly, performance or minimize resource use. > Proposed is a standard mechanism to version the API and negotiate the API > version between client and server at connect time. The focus here is between > Drill clients (JDBC, ODBC) and the Drill server. The same mechanism can also > be used between servers to support rolling upgrades. > This proposal is an outline; it is not a detailed design. The purpose here is > to drive understanding of the problem. Once we have that, we can focus on the > implementation details. > h4. Problem Statement > The problem we wish to address here concerns both the _syntax_ and > _semantics_ of API messages. Syntax concerns: > * The set of messages and their sequence > * The format of bytes on the wire > * The format of message packets > Semantics concerns: > * The meaning of each field. > * The layout of non-message data (vectors, in Drill.) > We wish to introduce a system whereby both syntax and semantics can be > evolved in a controlled, known manner such that: > * A client of version x can connect to, and interoperate with, a server in a > range of versions (x-y, x+z) for some values of y and z. > For example, version x of the Drill client is deployed in the field. It must > connect to the oldest Drill cluster available to that client. (That is it > must connect to servers up to y versions old.) During an upgrade, the server > may be upgraded before the client. Thus, the client must also work with > servers up to z versions newer than the client. > If we wish to tackle rolling upgrades, then y and z can both be 1 for > server-to-server APIs. A version x server will talk with (x-1) servers when > the cluster upgrades to x, and will talk to (x+1) servers when the cluster is > upgraded to version (x+1). > h4. Current State > Drill currently provides some ad-hoc version compatibility: > * Slow change. Drill's APIs have not changed much since Drill 1.0, thereby > avoiding the issue. > * Protobuf support. Drill uses Protobuf for message bodies, leveraging that > format's ability to absorb the additional or deprecation of individual fields. > * API version number. The API holds a version number, though the code to use > it is rather ad-hoc. > The above has allowed clever coding to handle some version changes, but each > is a one-off, ad-hoc collision. The recent security work is an example that, > with enough effort, ad-hoc solutions can be found. > The above cannot handle: > * Change in the message order > * Change in the "pbody/dbody" structure of each message. > * Change in the structure of serialized value vectors. > As a result, the current structure prevents any change to Drill's core > mechanism, value vectors, as there is no way or clients and servers to > negotiate the vector wire format. For example, Drill cannot adopt Arrow > because a pre-Arrow client would n