Github user paul-rogers commented on the pull request:

    https://github.com/apache/drill/pull/474#issuecomment-208959253
  
    Some other design issues. The idea of a rollling upgrade presupposes that 
we can shut down a Drillbit, bring up a new one, and the cluster keeps running. 
But, today, bringing down a Drillbit causes all in-flight queries on that node 
to fail. There is no way to mark a node as "quiescent" (up, but not accepting 
new work.) So, a rolling upgrade today would entail a long series of query 
failures as we replace each of, say, 20 or 50 nodes. So, in fact, it is less 
disruptive to take the cluster down, push an upgrade, and bring it back up. 
(See DRILL-4286.)
    
    Back on testing: testing is essential. A feature that allow +/-1 feature 
compatibility is not helpful unless someone (other than the user) can certify 
that it works. If the user gets to do the checking, then it is not very 
helpful: safer just to do a full upgrade.
    
    To emphasize an earlier point: there are two distinct issues. One is a 
managed cluster upgrade (the admin can do it with the help of a management 
tool.) The other are the many Drill clients spread across desktops: that is a 
classic desktop software upgrade. Some might be on planes, others locked in 
desks while someone is on vacation. Let's think about how to upgrade JDBC 
drivers and the like given this reality.
    
    Is the compatiblity policy number or time based? As an admin, can I expect 
to have a three-month window for upgrades? Or, will it sometimes be one month, 
others four months, depending on who changes what? Should we have a time-based 
policy?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to