As we discussed at this morning's hangout, Jacques took the action to put
together a strawman compatibility points document.  Would it be better to
wait for that document before we debate this further?

-- Zelaine

On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau <[email protected]> wrote:

> I agree with Paul, too. Perfect compatibility would be great. I recognize
> the issues that a version break could cause.  These are some of the issues
> that I believe require a version break to address:
> - Support nulls in lists.
> - Distinguish null maps from empty maps.
> - Distinguish null arrays from empty arrays.
> - Support sparse maps (analogous to Parquet maps instead of our current
> approach analogous to structs in Parquet lingo).
> - Clean up decimal and enable it by default.
> - Support full Avro <> Parquet roundtrip (and Parquet files generated by
> other tools).
> - Enable union type by default.
> - Improve performance execution performance of nullable values.
>
> I think these things need to be addressed in the 2.x line (let's say that
> is ~12 months). This is all about tradeoffs which is why I keep asking
> people to provide concrete impact. If you think at least one of these
> should be resolved, you're arguing for breaking wire compatibility between
> 1.x and 2.x.
>
> So let's get concrete:
>
> - How many users are running multiple clusters and using a single client to
> connect them?
> - What BI tools are most users using? What is the primary driver they are
> using?
> - What BI tools are packaging a Drill driver? If any, what is the update
> process and lead time?
> - How many users are skipping multiple Drill versions (e.g. going from 1.2
> to 1.6)? (Beyond the MapR tick-tock pattern)
> - How many users are delaying driver upgrade substantially? Are there
> customers using the 1.0 driver?
> - What is the average number of deployed clients per Drillbit cluster?
>
> These are some of the things that need to be evaluated to determine whether
> we choose to implement a compatibility layer or simply make a full break.
> (And in reality, I'm not sure we have the resources to build and carry a
> complex compatibility layer for these changes.)
>
> Whatever the policy we agree upon for future commitments to the user base,
> we're in a situation where there are very important reasons to move the
> codebase forward and change the wire protocol for 2.x.
>
> I think it is noble to strive towards backwards compatibility. We should
> always do this. However, I also think that--especially early in a product's
> life--it is better to resolve technical debt issues and break a few eggs
> than defer and carry a bunch of extra code around.
>
> Yes, it can suck for users. Luckily, we should also be giving users a bunch
> of positive reasons that it is worth upgrading and dealing with this
> version break. These include better perf, better compatibility with other
> tools, union type support, faster bi tool behaviors and a number of other
> things.
>
> I for one vote for moving forward and making sure that the 2.x branch is
> the highest quality and best version of Drill yet rather than focusing on
> minimizing the upgrade cost. All upgrades are a cost/benefit analysis.
> Drill is too young to focus on only minimizing the cost. We should be
> working to make sure the other part of the equation (benefit) is where
> we're spending the vast majority of our time.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Apr 12, 2016 at 3:38 PM, Neeraja Rentachintala <
> [email protected]> wrote:
>
> > I agree with Paul. Great points.
> > I would also add the partners aspect to it. Majority of Drill users use
> it
> > in conjunction with a BI tool.
> >
> >
> > -Neeraja
> >
> > On Tue, Apr 12, 2016 at 3:34 PM, Paul Rogers <[email protected]>
> wrote:
> >
> > > Hi Jacques,
> > >
> > > My two cents…
> > >
> > > The unfortunate reality is that enterprise customers move slowly. There
> > is
> > > a delay in the time it takes for end users to upgrade to a new release.
> > > When a third-party tool must also upgrade, the delay becomes even
> longer.
> > >
> > > At a high level, we need to provide a window of time in which old/new
> > > clients work with old/new servers. I may have a 1.6 client. The cluster
> > > upgrades to 1.8. I need time to upgrade my client to 1.8 — especially
> if
> > I
> > > have to wait for the vendor to provide a new package.
> > >
> > > If I connect to two clusters, I may upgrade my client to 1.8 for one,
> but
> > > I still need to connect to 1.6 for the other if they upgrade on
> different
> > > schedules.
> > >
> > > This is exactly why we need to figure out a policy: how do we give
> users
> > a
> > > sufficient window of time to complete upgrades, even across the 1.x/2.x
> > > boundary?
> > >
> > > The cost of not providing such a window? Broken production systems,
> > > unpleasant escalations and unhappy customers.
> > >
> > > Thanks,
> > >
> > > - Paul
> > >
> > > > On Apr 12, 2016, at 3:14 PM, Jacques Nadeau <[email protected]>
> > wrote:
> > > >
> > > >>> What I am suggesting is that we need to maintain backward
> > > compatibility with
> > > > a defined set of 1.x version clients when Drill 2.0 version is out.
> > > >
> > > > I'm asking you to be concrete on why. There is definitely a cost to
> > > > maintaining this compatibility. What are the real costs if we don't?
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Wed, Apr 6, 2016 at 9:21 AM, Neeraja Rentachintala <
> > > > [email protected]> wrote:
> > > >
> > > >> Jacques
> > > >> can you elaborate on what you mean by 'internal' implementation
> > changes
> > > but
> > > >> maintain external API.
> > > >> I thought that changes that are being discussed here are the Drill
> > > client
> > > >> library changes.
> > > >> What I am suggesting is that we need to maintain backward
> > compatibility
> > > >> with a defined set of 1.x version clients when Drill 2.0 version is
> > out.
> > > >>
> > > >> Neeraja
> > > >>
> > > >> On Tue, Apr 5, 2016 at 12:12 PM, Jacques Nadeau <[email protected]
> >
> > > >> wrote:
> > > >>
> > > >>> Thanks for bringing this up. BI compatibility is super important.
> > > >>>
> > > >>> The discussions here are primarily about internal implementation
> > > changes
> > > >> as
> > > >>> opposed to external API changes. From a BI perspective, I think
> > (hope)
> > > >>> everyone shares the goal of having zero (to minimal) changes in
> terms
> > > of
> > > >>> ODBC and JDBC behaviors in v2. The items outlined in DRILL-4417 are
> > > also
> > > >>> critical to strong BI adoption as numerous patterns right now are
> > > >>> suboptimal and we need to get them improved.
> > > >>>
> > > >>> In terms of your request of the community, it makes sense to have a
> > > >>> strategy around this. It sounds like you have a bunch of
> > considerations
> > > >>> that should be weighed but your presentation doesn't actually share
> > > what
> > > >>> the concrete details. To date, there has been no formal consensus
> or
> > > >>> commitment to any particular compatibility behavior. We've had an
> > > >> informal
> > > >>> "don't change wire compatibility within a major version". If we are
> > > going
> > > >>> to have a rich dialog about pros and cons of different approaches,
> we
> > > >> need
> > > >>> to make sure that everybody has the same understanding of the
> > dynamics.
> > > >> For
> > > >>> example:
> > > >>>
> > > >>> Are you saying that someone has packaged the Apache Drill drivers
> in
> > > >> their
> > > >>> BI solution? If so, what version? Is this the Apache release
> artifact
> > > or
> > > >> a
> > > >>> custom version? Has someone certified them? Did anyone commit a
> > > >> particular
> > > >>> compatibility pattern to a BI vendor on behalf of the community?
> > > >>>
> > > >>> To date, I'm not aware of any of these types of decisions being
> > > discussed
> > > >>> in the community so it is hard to evaluate how important they are
> > > versus
> > > >>> other things. Knowing that DRILL-4417 is outstanding and critical
> to
> > > the
> > > >>> best BI experience, I think we should be very cautious of requiring
> > > >>> long-term support of the existing (internal) implementation.
> > > Guaranteeing
> > > >>> ODBC and JDBC behaviors should be satisfactory for the vast
> majority
> > of
> > > >>> situations. Anything beyond this needs to have a very public
> > > cost/benefit
> > > >>> tradeoff. In other words, please expose your thinking 100x more so
> > that
> > > >> we
> > > >>> can all understand the ramifications of different strategies.
> > > >>>
> > > >>> thanks!
> > > >>> Jacques
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Jacques Nadeau
> > > >>> CTO and Co-Founder, Dremio
> > > >>>
> > > >>> On Tue, Apr 5, 2016 at 10:01 AM, Neeraja Rentachintala <
> > > >>> [email protected]> wrote:
> > > >>>
> > > >>>> Sorry for coming back to this thread late.
> > > >>>> I have some feedback on the compatibility aspects of 2.0.
> > > >>>>
> > > >>>> We are working with a variety of BI vendors to certify Drill and
> > > >> provide
> > > >>>> native connectors for Drill. Having native access from BI tools
> > helps
> > > >>> with
> > > >>>> seamless experience for the users with performance and
> > functionality.
> > > >>> This
> > > >>>> work is in progress and they are (and will be) working with 1.x
> > > >> versions
> > > >>> of
> > > >>>> Drill as part of the development because thats what we have now.
> > Some
> > > >> of
> > > >>>> these connectors will be available before 2.0 and some of them can
> > > come
> > > >>> in
> > > >>>> post 2.0 as certification is a long process. We don't want to be
> in
> > a
> > > >>>> situation where the native connectors are just released by certain
> > BI
> > > >>>> vendor and the connector is immediately obsolete or doesn't work
> > > >> because
> > > >>> we
> > > >>>> have 2.0 release out now.
> > > >>>> So the general requirement should be that we maintain backward
> > > >>>> compatibility with certain number of prior releases. This is very
> > > >>> important
> > > >>>> for the success of the project and adoption by eco system. I am
> > happy
> > > >> to
> > > >>>> discuss further.
> > > >>>>
> > > >>>> -Neeraja
> > > >>>>
> > > >>>> On Tue, Apr 5, 2016 at 8:44 AM, Jacques Nadeau <
> [email protected]>
> > > >>> wrote:
> > > >>>>
> > > >>>>> I'm going to take this as lazy consensus. I'll create the branch.
> > > >>>>>
> > > >>>>> Once created, all merges to the master (1.x branch) should also
> go
> > to
> > > >>> the
> > > >>>>> v2 branch unless we have a discussion here that they aren't
> > > >> applicable.
> > > >>>>> When committing, please make sure to commit to both locations.
> > > >>>>>
> > > >>>>> thanks,
> > > >>>>> Jacques
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Jacques Nadeau
> > > >>>>> CTO and Co-Founder, Dremio
> > > >>>>>
> > > >>>>> On Sat, Mar 26, 2016 at 7:26 PM, Jacques Nadeau <
> > [email protected]>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Re Compatibility:
> > > >>>>>>
> > > >>>>>> I actually don't even think 1.0 clients work with 1.6 server, do
> > > >>> they?
> > > >>>>>>
> > > >>>>>> I would probably decrease the cross-compatibility requirement
> > > >>> burden. A
> > > >>>>>> nice goal would be cross compatibility across an extended series
> > of
> > > >>>>>> releases. However, given all the things we've learned in the
> last
> > > >>> year,
> > > >>>>> we
> > > >>>>>> shouldn't try to maintain more legacy than is necessary. As
> such,
> > I
> > > >>>>> propose
> > > >>>>>> that we consider the requirement of 2.0 to be:
> > > >>>>>>
> > > >>>>>> 1.lastX works with 2.firstX. (For example, if 1.8 is the last
> > minor
> > > >>>>>> release of the 1.x series, 1.8 would work with 2.0.)
> > > >>>>>>
> > > >>>>>> This simplifies testing (we don't have to worry about things
> like
> > > >>> does
> > > >>>>> 1.1
> > > >>>>>> work with 2.3, etc) and gives people an upgrade path as they
> > > >> desire.
> > > >>>> This
> > > >>>>>> also allows us to decide what pieces of the compatibility shim
> go
> > > >> in
> > > >>>> the
> > > >>>>>> 2.0 server versus the 1.lastX client. (I actually lean towards
> > > >>>> allowing a
> > > >>>>>> full break between v1 and v2 server/client but understand that
> > that
> > > >>>> level
> > > >>>>>> or coordination is hard in many organizations since analysts are
> > > >>>> separate
> > > >>>>>> from IT). Hopefully, what I'm proposing can be a good compromise
> > > >>>> between
> > > >>>>>> progress and deployment ease.
> > > >>>>>>
> > > >>>>>> Thoughts?
> > > >>>>>>
> > > >>>>>> Re: Branches/Dangers
> > > >>>>>>
> > > >>>>>> Good points on this Julian.
> > > >>>>>>
> > > >>>>>> How about this:
> > > >>>>>>
> > > >>>>>> - small fixes and enhancements PRs should be made against v1
> > > >>>>>> - new feature PRs should be made against v2
> > > >>>>>> - v2 should continue to always pass all precommit tests during
> its
> > > >>> life
> > > >>>>>> - v2 becomes master in two months
> > > >>>>>>
> > > >>>>>> I definitely don't want to create instability in the v2 branch.
> > > >>>>>>
> > > >>>>>> The other option I see is we can only do bug fix releases and
> > > >> branch
> > > >>>> the
> > > >>>>>> current master into a maintenance branch and treat master as v2.
> > > >>>>>>
> > > >>>>>> Other ideas?
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Jacques Nadeau
> > > >>>>>> CTO and Co-Founder, Dremio
> > > >>>>>>
> > > >>>>>> On Sat, Mar 26, 2016 at 6:07 PM, Julian Hyde <[email protected]>
> > > >>> wrote:
> > > >>>>>>
> > > >>>>>>> Do you plan to be doing significant development on both the v1
> > and
> > > >>> v2
> > > >>>>>>> branches, and if so, for how long? I have been bitten badly by
> > > >> that
> > > >>>>> pattern
> > > >>>>>>> in the past. Developers put lots of unrelated, destabilizing
> > > >> changes
> > > >>>>> into
> > > >>>>>>> v2, it look longer than expected to stabilize v2, product
> > > >> management
> > > >>>>> lost
> > > >>>>>>> confidence in v2 and shifted resources back to v1, and v2 never
> > > >>> caught
> > > >>>>> up
> > > >>>>>>> with v1.
> > > >>>>>>>
> > > >>>>>>> One important question: Which branch will you ask people to
> > target
> > > >>> for
> > > >>>>>>> pull requests? v1, v2 or both? If they submit to v2, and v2 is
> > > >>> broken,
> > > >>>>> how
> > > >>>>>>> will you know whether the patches are good?
> > > >>>>>>>
> > > >>>>>>> My recommendation is to choose one of the following: (1) put a
> > > >>> strict
> > > >>>>>>> time limit of say 2 months after which v2 would become the
> master
> > > >>>> branch
> > > >>>>>>> (and v1 master would become a maintenance branch), or (2) make
> v2
> > > >>>>> focused
> > > >>>>>>> on a particular architectural feature; create multiple
> > independent
> > > >>>>> feature
> > > >>>>>>> branches with breaking API changes if you need to.
> > > >>>>>>>
> > > >>>>>>> Julian
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> On Mar 26, 2016, at 1:41 PM, Paul Rogers <
> [email protected]>
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> Hi All,
> > > >>>>>>>>
> > > >>>>>>>> 2.0 is a good opportunity to enhance our ZK information. See
> > > >>>>>>> DRILL-4543: Advertise Drill-bit ports, status, capabilities in
> > > >>>>> ZooKeeper.
> > > >>>>>>> This change will simplify YARN integration.
> > > >>>>>>>>
> > > >>>>>>>> This enhancement will change the “public API” in ZK. To
> Parth’s
> > > >>>> point,
> > > >>>>>>> we can do so in a way that old clients work - as long as a
> > > >> Drill-bit
> > > >>>>> uses
> > > >>>>>>> default ports.
> > > >>>>>>>>
> > > >>>>>>>> I’ve marked this JIRA as a candidate for 2.0.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>>
> > > >>>>>>>> - Paul
> > > >>>>>>>>
> > > >>>>>>>>> On Mar 24, 2016, at 4:11 PM, Parth Chandra <
> [email protected]>
> > > >>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> What's our proposal for backward compatibility between 1.x
> and
> > > >>> 2.x?
> > > >>>>>>>>> My thoughts:
> > > >>>>>>>>> Optional  -  Allow a mixture of 1.x and 2.x drillbits in a
> > > >>> cluster.
> > > >>>>>>>>> Required - 1.x clients should be able to talk to 2.x
> drillbits.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Mar 24, 2016 at 8:55 AM, Jacques Nadeau <
> > > >>>> [email protected]>
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> There are some changes that either have reviews pending or
> are
> > > >>> in
> > > >>>>>>> progress
> > > >>>>>>>>>> that would require breaking changes to Drill core.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Examples Include:
> > > >>>>>>>>>> DRILL-4455 (arrow integration)
> > > >>>>>>>>>> DRILL-4417 (jdbc/odbc/rpc changes)
> > > >>>>>>>>>> DRILL-4534 (improve null performance)
> > > >>>>>>>>>>
> > > >>>>>>>>>> I've created a new 2.0.0 release version in JIRA and moved
> > > >> these
> > > >>>>>>> tasks to
> > > >>>>>>>>>> that umbrella.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'd like to propose a new v2 release branch where we can
> start
> > > >>>>>>>>>> incorporating these changes without disrupting v1 stability
> > > >> and
> > > >>>>>>>>>> compatibility.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>> Jacques Nadeau
> > > >>>>>>>>>> CTO and Co-Founder, Dremio
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to