Github user paul-rogers commented on the pull request: https://github.com/apache/drill/pull/474#issuecomment-208959253 Some other design issues. The idea of a rollling upgrade presupposes that we can shut down a Drillbit, bring up a new one, and the cluster keeps running. But, today, bringing down a Drillbit causes all in-flight queries on that node to fail. There is no way to mark a node as "quiescent" (up, but not accepting new work.) So, a rolling upgrade today would entail a long series of query failures as we replace each of, say, 20 or 50 nodes. So, in fact, it is less disruptive to take the cluster down, push an upgrade, and bring it back up. (See DRILL-4286.) Back on testing: testing is essential. A feature that allow +/-1 feature compatibility is not helpful unless someone (other than the user) can certify that it works. If the user gets to do the checking, then it is not very helpful: safer just to do a full upgrade. To emphasize an earlier point: there are two distinct issues. One is a managed cluster upgrade (the admin can do it with the help of a management tool.) The other are the many Drill clients spread across desktops: that is a classic desktop software upgrade. Some might be on planes, others locked in desks while someone is on vacation. Let's think about how to upgrade JDBC drivers and the like given this reality. Is the compatiblity policy number or time based? As an admin, can I expect to have a three-month window for upgrades? Or, will it sometimes be one month, others four months, depending on who changes what? Should we have a time-based policy?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---