Makes sense to postpone the debate : ) Will Look forward for the proposal. On Tuesday, April 12, 2016, Zelaine Fong <zf...@maprtech.com> wrote:
> As we discussed at this morning's hangout, Jacques took the action to put > together a strawman compatibility points document. Would it be better to > wait for that document before we debate this further? > > -- Zelaine > > On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau <jacq...@dremio.com > <javascript:;>> wrote: > > > I agree with Paul, too. Perfect compatibility would be great. I recognize > > the issues that a version break could cause. These are some of the > issues > > that I believe require a version break to address: > > - Support nulls in lists. > > - Distinguish null maps from empty maps. > > - Distinguish null arrays from empty arrays. > > - Support sparse maps (analogous to Parquet maps instead of our current > > approach analogous to structs in Parquet lingo). > > - Clean up decimal and enable it by default. > > - Support full Avro <> Parquet roundtrip (and Parquet files generated by > > other tools). > > - Enable union type by default. > > - Improve performance execution performance of nullable values. > > > > I think these things need to be addressed in the 2.x line (let's say that > > is ~12 months). This is all about tradeoffs which is why I keep asking > > people to provide concrete impact. If you think at least one of these > > should be resolved, you're arguing for breaking wire compatibility > between > > 1.x and 2.x. > > > > So let's get concrete: > > > > - How many users are running multiple clusters and using a single client > to > > connect them? > > - What BI tools are most users using? What is the primary driver they are > > using? > > - What BI tools are packaging a Drill driver? If any, what is the update > > process and lead time? > > - How many users are skipping multiple Drill versions (e.g. going from > 1.2 > > to 1.6)? (Beyond the MapR tick-tock pattern) > > - How many users are delaying driver upgrade substantially? Are there > > customers using the 1.0 driver? > > - What is the average number of deployed clients per Drillbit cluster? > > > > These are some of the things that need to be evaluated to determine > whether > > we choose to implement a compatibility layer or simply make a full break. > > (And in reality, I'm not sure we have the resources to build and carry a > > complex compatibility layer for these changes.) > > > > Whatever the policy we agree upon for future commitments to the user > base, > > we're in a situation where there are very important reasons to move the > > codebase forward and change the wire protocol for 2.x. > > > > I think it is noble to strive towards backwards compatibility. We should > > always do this. However, I also think that--especially early in a > product's > > life--it is better to resolve technical debt issues and break a few eggs > > than defer and carry a bunch of extra code around. > > > > Yes, it can suck for users. Luckily, we should also be giving users a > bunch > > of positive reasons that it is worth upgrading and dealing with this > > version break. These include better perf, better compatibility with other > > tools, union type support, faster bi tool behaviors and a number of other > > things. > > > > I for one vote for moving forward and making sure that the 2.x branch is > > the highest quality and best version of Drill yet rather than focusing on > > minimizing the upgrade cost. All upgrades are a cost/benefit analysis. > > Drill is too young to focus on only minimizing the cost. We should be > > working to make sure the other part of the equation (benefit) is where > > we're spending the vast majority of our time. > > > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Tue, Apr 12, 2016 at 3:38 PM, Neeraja Rentachintala < > > nrentachint...@maprtech.com <javascript:;>> wrote: > > > > > I agree with Paul. Great points. > > > I would also add the partners aspect to it. Majority of Drill users use > > it > > > in conjunction with a BI tool. > > > > > > > > > -Neeraja > > > > > > On Tue, Apr 12, 2016 at 3:34 PM, Paul Rogers <prog...@maprtech.com > <javascript:;>> > > wrote: > > > > > > > Hi Jacques, > > > > > > > > My two cents… > > > > > > > > The unfortunate reality is that enterprise customers move slowly. > There > > > is > > > > a delay in the time it takes for end users to upgrade to a new > release. > > > > When a third-party tool must also upgrade, the delay becomes even > > longer. > > > > > > > > At a high level, we need to provide a window of time in which old/new > > > > clients work with old/new servers. I may have a 1.6 client. The > cluster > > > > upgrades to 1.8. I need time to upgrade my client to 1.8 — especially > > if > > > I > > > > have to wait for the vendor to provide a new package. > > > > > > > > If I connect to two clusters, I may upgrade my client to 1.8 for one, > > but > > > > I still need to connect to 1.6 for the other if they upgrade on > > different > > > > schedules. > > > > > > > > This is exactly why we need to figure out a policy: how do we give > > users > > > a > > > > sufficient window of time to complete upgrades, even across the > 1.x/2.x > > > > boundary? > > > > > > > > The cost of not providing such a window? Broken production systems, > > > > unpleasant escalations and unhappy customers. > > > > > > > > Thanks, > > > > > > > > - Paul > > > > > > > > > On Apr 12, 2016, at 3:14 PM, Jacques Nadeau <jacq...@dremio.com > <javascript:;>> > > > wrote: > > > > > > > > > >>> What I am suggesting is that we need to maintain backward > > > > compatibility with > > > > > a defined set of 1.x version clients when Drill 2.0 version is out. > > > > > > > > > > I'm asking you to be concrete on why. There is definitely a cost to > > > > > maintaining this compatibility. What are the real costs if we > don't? > > > > > > > > > > -- > > > > > Jacques Nadeau > > > > > CTO and Co-Founder, Dremio > > > > > > > > > > On Wed, Apr 6, 2016 at 9:21 AM, Neeraja Rentachintala < > > > > > nrentachint...@maprtech.com <javascript:;>> wrote: > > > > > > > > > >> Jacques > > > > >> can you elaborate on what you mean by 'internal' implementation > > > changes > > > > but > > > > >> maintain external API. > > > > >> I thought that changes that are being discussed here are the Drill > > > > client > > > > >> library changes. > > > > >> What I am suggesting is that we need to maintain backward > > > compatibility > > > > >> with a defined set of 1.x version clients when Drill 2.0 version > is > > > out. > > > > >> > > > > >> Neeraja > > > > >> > > > > >> On Tue, Apr 5, 2016 at 12:12 PM, Jacques Nadeau < > jacq...@dremio.com <javascript:;> > > > > > > > >> wrote: > > > > >> > > > > >>> Thanks for bringing this up. BI compatibility is super important. > > > > >>> > > > > >>> The discussions here are primarily about internal implementation > > > > changes > > > > >> as > > > > >>> opposed to external API changes. From a BI perspective, I think > > > (hope) > > > > >>> everyone shares the goal of having zero (to minimal) changes in > > terms > > > > of > > > > >>> ODBC and JDBC behaviors in v2. The items outlined in DRILL-4417 > are > > > > also > > > > >>> critical to strong BI adoption as numerous patterns right now are > > > > >>> suboptimal and we need to get them improved. > > > > >>> > > > > >>> In terms of your request of the community, it makes sense to > have a > > > > >>> strategy around this. It sounds like you have a bunch of > > > considerations > > > > >>> that should be weighed but your presentation doesn't actually > share > > > > what > > > > >>> the concrete details. To date, there has been no formal consensus > > or > > > > >>> commitment to any particular compatibility behavior. We've had an > > > > >> informal > > > > >>> "don't change wire compatibility within a major version". If we > are > > > > going > > > > >>> to have a rich dialog about pros and cons of different > approaches, > > we > > > > >> need > > > > >>> to make sure that everybody has the same understanding of the > > > dynamics. > > > > >> For > > > > >>> example: > > > > >>> > > > > >>> Are you saying that someone has packaged the Apache Drill drivers > > in > > > > >> their > > > > >>> BI solution? If so, what version? Is this the Apache release > > artifact > > > > or > > > > >> a > > > > >>> custom version? Has someone certified them? Did anyone commit a > > > > >> particular > > > > >>> compatibility pattern to a BI vendor on behalf of the community? > > > > >>> > > > > >>> To date, I'm not aware of any of these types of decisions being > > > > discussed > > > > >>> in the community so it is hard to evaluate how important they are > > > > versus > > > > >>> other things. Knowing that DRILL-4417 is outstanding and critical > > to > > > > the > > > > >>> best BI experience, I think we should be very cautious of > requiring > > > > >>> long-term support of the existing (internal) implementation. > > > > Guaranteeing > > > > >>> ODBC and JDBC behaviors should be satisfactory for the vast > > majority > > > of > > > > >>> situations. Anything beyond this needs to have a very public > > > > cost/benefit > > > > >>> tradeoff. In other words, please expose your thinking 100x more > so > > > that > > > > >> we > > > > >>> can all understand the ramifications of different strategies. > > > > >>> > > > > >>> thanks! > > > > >>> Jacques > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> -- > > > > >>> Jacques Nadeau > > > > >>> CTO and Co-Founder, Dremio > > > > >>> > > > > >>> On Tue, Apr 5, 2016 at 10:01 AM, Neeraja Rentachintala < > > > > >>> nrentachint...@maprtech.com <javascript:;>> wrote: > > > > >>> > > > > >>>> Sorry for coming back to this thread late. > > > > >>>> I have some feedback on the compatibility aspects of 2.0. > > > > >>>> > > > > >>>> We are working with a variety of BI vendors to certify Drill and > > > > >> provide > > > > >>>> native connectors for Drill. Having native access from BI tools > > > helps > > > > >>> with > > > > >>>> seamless experience for the users with performance and > > > functionality. > > > > >>> This > > > > >>>> work is in progress and they are (and will be) working with 1.x > > > > >> versions > > > > >>> of > > > > >>>> Drill as part of the development because thats what we have now. > > > Some > > > > >> of > > > > >>>> these connectors will be available before 2.0 and some of them > can > > > > come > > > > >>> in > > > > >>>> post 2.0 as certification is a long process. We don't want to be > > in > > > a > > > > >>>> situation where the native connectors are just released by > certain > > > BI > > > > >>>> vendor and the connector is immediately obsolete or doesn't work > > > > >> because > > > > >>> we > > > > >>>> have 2.0 release out now. > > > > >>>> So the general requirement should be that we maintain backward > > > > >>>> compatibility with certain number of prior releases. This is > very > > > > >>> important > > > > >>>> for the success of the project and adoption by eco system. I am > > > happy > > > > >> to > > > > >>>> discuss further. > > > > >>>> > > > > >>>> -Neeraja > > > > >>>> > > > > >>>> On Tue, Apr 5, 2016 at 8:44 AM, Jacques Nadeau < > > jacq...@dremio.com <javascript:;>> > > > > >>> wrote: > > > > >>>> > > > > >>>>> I'm going to take this as lazy consensus. I'll create the > branch. > > > > >>>>> > > > > >>>>> Once created, all merges to the master (1.x branch) should also > > go > > > to > > > > >>> the > > > > >>>>> v2 branch unless we have a discussion here that they aren't > > > > >> applicable. > > > > >>>>> When committing, please make sure to commit to both locations. > > > > >>>>> > > > > >>>>> thanks, > > > > >>>>> Jacques > > > > >>>>> > > > > >>>>> > > > > >>>>> -- > > > > >>>>> Jacques Nadeau > > > > >>>>> CTO and Co-Founder, Dremio > > > > >>>>> > > > > >>>>> On Sat, Mar 26, 2016 at 7:26 PM, Jacques Nadeau < > > > jacq...@dremio.com <javascript:;>> > > > > >>>>> wrote: > > > > >>>>> > > > > >>>>>> Re Compatibility: > > > > >>>>>> > > > > >>>>>> I actually don't even think 1.0 clients work with 1.6 server, > do > > > > >>> they? > > > > >>>>>> > > > > >>>>>> I would probably decrease the cross-compatibility requirement > > > > >>> burden. A > > > > >>>>>> nice goal would be cross compatibility across an extended > series > > > of > > > > >>>>>> releases. However, given all the things we've learned in the > > last > > > > >>> year, > > > > >>>>> we > > > > >>>>>> shouldn't try to maintain more legacy than is necessary. As > > such, > > > I > > > > >>>>> propose > > > > >>>>>> that we consider the requirement of 2.0 to be: > > > > >>>>>> > > > > >>>>>> 1.lastX works with 2.firstX. (For example, if 1.8 is the last > > > minor > > > > >>>>>> release of the 1.x series, 1.8 would work with 2.0.) > > > > >>>>>> > > > > >>>>>> This simplifies testing (we don't have to worry about things > > like > > > > >>> does > > > > >>>>> 1.1 > > > > >>>>>> work with 2.3, etc) and gives people an upgrade path as they > > > > >> desire. > > > > >>>> This > > > > >>>>>> also allows us to decide what pieces of the compatibility shim > > go > > > > >> in > > > > >>>> the > > > > >>>>>> 2.0 server versus the 1.lastX client. (I actually lean towards > > > > >>>> allowing a > > > > >>>>>> full break between v1 and v2 server/client but understand that > > > that > > > > >>>> level > > > > >>>>>> or coordination is hard in many organizations since analysts > are > > > > >>>> separate > > > > >>>>>> from IT). Hopefully, what I'm proposing can be a good > compromise > > > > >>>> between > > > > >>>>>> progress and deployment ease. > > > > >>>>>> > > > > >>>>>> Thoughts? > > > > >>>>>> > > > > >>>>>> Re: Branches/Dangers > > > > >>>>>> > > > > >>>>>> Good points on this Julian. > > > > >>>>>> > > > > >>>>>> How about this: > > > > >>>>>> > > > > >>>>>> - small fixes and enhancements PRs should be made against v1 > > > > >>>>>> - new feature PRs should be made against v2 > > > > >>>>>> - v2 should continue to always pass all precommit tests during > > its > > > > >>> life > > > > >>>>>> - v2 becomes master in two months > > > > >>>>>> > > > > >>>>>> I definitely don't want to create instability in the v2 > branch. > > > > >>>>>> > > > > >>>>>> The other option I see is we can only do bug fix releases and > > > > >> branch > > > > >>>> the > > > > >>>>>> current master into a maintenance branch and treat master as > v2. > > > > >>>>>> > > > > >>>>>> Other ideas? > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> -- > > > > >>>>>> Jacques Nadeau > > > > >>>>>> CTO and Co-Founder, Dremio > > > > >>>>>> > > > > >>>>>> On Sat, Mar 26, 2016 at 6:07 PM, Julian Hyde < > jh...@apache.org <javascript:;>> > > > > >>> wrote: > > > > >>>>>> > > > > >>>>>>> Do you plan to be doing significant development on both the > v1 > > > and > > > > >>> v2 > > > > >>>>>>> branches, and if so, for how long? I have been bitten badly > by > > > > >> that > > > > >>>>> pattern > > > > >>>>>>> in the past. Developers put lots of unrelated, destabilizing > > > > >> changes > > > > >>>>> into > > > > >>>>>>> v2, it look longer than expected to stabilize v2, product > > > > >> management > > > > >>>>> lost > > > > >>>>>>> confidence in v2 and shifted resources back to v1, and v2 > never > > > > >>> caught > > > > >>>>> up > > > > >>>>>>> with v1. > > > > >>>>>>> > > > > >>>>>>> One important question: Which branch will you ask people to > > > target > > > > >>> for > > > > >>>>>>> pull requests? v1, v2 or both? If they submit to v2, and v2 > is > > > > >>> broken, > > > > >>>>> how > > > > >>>>>>> will you know whether the patches are good? > > > > >>>>>>> > > > > >>>>>>> My recommendation is to choose one of the following: (1) put > a > > > > >>> strict > > > > >>>>>>> time limit of say 2 months after which v2 would become the > > master > > > > >>>> branch > > > > >>>>>>> (and v1 master would become a maintenance branch), or (2) > make > > v2 > > > > >>>>> focused > > > > >>>>>>> on a particular architectural feature; create multiple > > > independent > > > > >>>>> feature > > > > >>>>>>> branches with breaking API changes if you need to. > > > > >>>>>>> > > > > >>>>>>> Julian > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>>> On Mar 26, 2016, at 1:41 PM, Paul Rogers < > > prog...@maprtech.com <javascript:;>> > > > > >>>>> wrote: > > > > >>>>>>>> > > > > >>>>>>>> Hi All, > > > > >>>>>>>> > > > > >>>>>>>> 2.0 is a good opportunity to enhance our ZK information. See > > > > >>>>>>> DRILL-4543: Advertise Drill-bit ports, status, capabilities > in > > > > >>>>> ZooKeeper. > > > > >>>>>>> This change will simplify YARN integration. > > > > >>>>>>>> > > > > >>>>>>>> This enhancement will change the “public API” in ZK. To > > Parth’s > > > > >>>> point, > > > > >>>>>>> we can do so in a way that old clients work - as long as a > > > > >> Drill-bit > > > > >>>>> uses > > > > >>>>>>> default ports. > > > > >>>>>>>> > > > > >>>>>>>> I’ve marked this JIRA as a candidate for 2.0. > > > > >>>>>>>> > > > > >>>>>>>> Thanks, > > > > >>>>>>>> > > > > >>>>>>>> - Paul > > > > >>>>>>>> > > > > >>>>>>>>> On Mar 24, 2016, at 4:11 PM, Parth Chandra < > > par...@apache.org <javascript:;>> > > > > >>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>> What's our proposal for backward compatibility between 1.x > > and > > > > >>> 2.x? > > > > >>>>>>>>> My thoughts: > > > > >>>>>>>>> Optional - Allow a mixture of 1.x and 2.x drillbits in a > > > > >>> cluster. > > > > >>>>>>>>> Required - 1.x clients should be able to talk to 2.x > > drillbits. > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> On Thu, Mar 24, 2016 at 8:55 AM, Jacques Nadeau < > > > > >>>> jacq...@dremio.com <javascript:;>> > > > > >>>>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> There are some changes that either have reviews pending or > > are > > > > >>> in > > > > >>>>>>> progress > > > > >>>>>>>>>> that would require breaking changes to Drill core. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Examples Include: > > > > >>>>>>>>>> DRILL-4455 (arrow integration) > > > > >>>>>>>>>> DRILL-4417 (jdbc/odbc/rpc changes) > > > > >>>>>>>>>> DRILL-4534 (improve null performance) > > > > >>>>>>>>>> > > > > >>>>>>>>>> I've created a new 2.0.0 release version in JIRA and moved > > > > >> these > > > > >>>>>>> tasks to > > > > >>>>>>>>>> that umbrella. > > > > >>>>>>>>>> > > > > >>>>>>>>>> I'd like to propose a new v2 release branch where we can > > start > > > > >>>>>>>>>> incorporating these changes without disrupting v1 > stability > > > > >> and > > > > >>>>>>>>>> compatibility. > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> -- > > > > >>>>>>>>>> Jacques Nadeau > > > > >>>>>>>>>> CTO and Co-Founder, Dremio > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > >