I agree with Paul, too. Perfect compatibility would be great. I recognize
the issues that a version break could cause.  These are some of the issues
that I believe require a version break to address:
- Support nulls in lists.
- Distinguish null maps from empty maps.
- Distinguish null arrays from empty arrays.
- Support sparse maps (analogous to Parquet maps instead of our current
approach analogous to structs in Parquet lingo).
- Clean up decimal and enable it by default.
- Support full Avro <> Parquet roundtrip (and Parquet files generated by
other tools).
- Enable union type by default.
- Improve performance execution performance of nullable values.

I think these things need to be addressed in the 2.x line (let's say that
is ~12 months). This is all about tradeoffs which is why I keep asking
people to provide concrete impact. If you think at least one of these
should be resolved, you're arguing for breaking wire compatibility between
1.x and 2.x.

So let's get concrete:

- How many users are running multiple clusters and using a single client to
connect them?
- What BI tools are most users using? What is the primary driver they are
using?
- What BI tools are packaging a Drill driver? If any, what is the update
process and lead time?
- How many users are skipping multiple Drill versions (e.g. going from 1.2
to 1.6)? (Beyond the MapR tick-tock pattern)
- How many users are delaying driver upgrade substantially? Are there
customers using the 1.0 driver?
- What is the average number of deployed clients per Drillbit cluster?

These are some of the things that need to be evaluated to determine whether
we choose to implement a compatibility layer or simply make a full break.
(And in reality, I'm not sure we have the resources to build and carry a
complex compatibility layer for these changes.)

Whatever the policy we agree upon for future commitments to the user base,
we're in a situation where there are very important reasons to move the
codebase forward and change the wire protocol for 2.x.

I think it is noble to strive towards backwards compatibility. We should
always do this. However, I also think that--especially early in a product's
life--it is better to resolve technical debt issues and break a few eggs
than defer and carry a bunch of extra code around.

Yes, it can suck for users. Luckily, we should also be giving users a bunch
of positive reasons that it is worth upgrading and dealing with this
version break. These include better perf, better compatibility with other
tools, union type support, faster bi tool behaviors and a number of other
things.

I for one vote for moving forward and making sure that the 2.x branch is
the highest quality and best version of Drill yet rather than focusing on
minimizing the upgrade cost. All upgrades are a cost/benefit analysis.
Drill is too young to focus on only minimizing the cost. We should be
working to make sure the other part of the equation (benefit) is where
we're spending the vast majority of our time.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Apr 12, 2016 at 3:38 PM, Neeraja Rentachintala <
nrentachint...@maprtech.com> wrote:

> I agree with Paul. Great points.
> I would also add the partners aspect to it. Majority of Drill users use it
> in conjunction with a BI tool.
>
>
> -Neeraja
>
> On Tue, Apr 12, 2016 at 3:34 PM, Paul Rogers <prog...@maprtech.com> wrote:
>
> > Hi Jacques,
> >
> > My two cents…
> >
> > The unfortunate reality is that enterprise customers move slowly. There
> is
> > a delay in the time it takes for end users to upgrade to a new release.
> > When a third-party tool must also upgrade, the delay becomes even longer.
> >
> > At a high level, we need to provide a window of time in which old/new
> > clients work with old/new servers. I may have a 1.6 client. The cluster
> > upgrades to 1.8. I need time to upgrade my client to 1.8 — especially if
> I
> > have to wait for the vendor to provide a new package.
> >
> > If I connect to two clusters, I may upgrade my client to 1.8 for one, but
> > I still need to connect to 1.6 for the other if they upgrade on different
> > schedules.
> >
> > This is exactly why we need to figure out a policy: how do we give users
> a
> > sufficient window of time to complete upgrades, even across the 1.x/2.x
> > boundary?
> >
> > The cost of not providing such a window? Broken production systems,
> > unpleasant escalations and unhappy customers.
> >
> > Thanks,
> >
> > - Paul
> >
> > > On Apr 12, 2016, at 3:14 PM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> > >
> > >>> What I am suggesting is that we need to maintain backward
> > compatibility with
> > > a defined set of 1.x version clients when Drill 2.0 version is out.
> > >
> > > I'm asking you to be concrete on why. There is definitely a cost to
> > > maintaining this compatibility. What are the real costs if we don't?
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Wed, Apr 6, 2016 at 9:21 AM, Neeraja Rentachintala <
> > > nrentachint...@maprtech.com> wrote:
> > >
> > >> Jacques
> > >> can you elaborate on what you mean by 'internal' implementation
> changes
> > but
> > >> maintain external API.
> > >> I thought that changes that are being discussed here are the Drill
> > client
> > >> library changes.
> > >> What I am suggesting is that we need to maintain backward
> compatibility
> > >> with a defined set of 1.x version clients when Drill 2.0 version is
> out.
> > >>
> > >> Neeraja
> > >>
> > >> On Tue, Apr 5, 2016 at 12:12 PM, Jacques Nadeau <jacq...@dremio.com>
> > >> wrote:
> > >>
> > >>> Thanks for bringing this up. BI compatibility is super important.
> > >>>
> > >>> The discussions here are primarily about internal implementation
> > changes
> > >> as
> > >>> opposed to external API changes. From a BI perspective, I think
> (hope)
> > >>> everyone shares the goal of having zero (to minimal) changes in terms
> > of
> > >>> ODBC and JDBC behaviors in v2. The items outlined in DRILL-4417 are
> > also
> > >>> critical to strong BI adoption as numerous patterns right now are
> > >>> suboptimal and we need to get them improved.
> > >>>
> > >>> In terms of your request of the community, it makes sense to have a
> > >>> strategy around this. It sounds like you have a bunch of
> considerations
> > >>> that should be weighed but your presentation doesn't actually share
> > what
> > >>> the concrete details. To date, there has been no formal consensus or
> > >>> commitment to any particular compatibility behavior. We've had an
> > >> informal
> > >>> "don't change wire compatibility within a major version". If we are
> > going
> > >>> to have a rich dialog about pros and cons of different approaches, we
> > >> need
> > >>> to make sure that everybody has the same understanding of the
> dynamics.
> > >> For
> > >>> example:
> > >>>
> > >>> Are you saying that someone has packaged the Apache Drill drivers in
> > >> their
> > >>> BI solution? If so, what version? Is this the Apache release artifact
> > or
> > >> a
> > >>> custom version? Has someone certified them? Did anyone commit a
> > >> particular
> > >>> compatibility pattern to a BI vendor on behalf of the community?
> > >>>
> > >>> To date, I'm not aware of any of these types of decisions being
> > discussed
> > >>> in the community so it is hard to evaluate how important they are
> > versus
> > >>> other things. Knowing that DRILL-4417 is outstanding and critical to
> > the
> > >>> best BI experience, I think we should be very cautious of requiring
> > >>> long-term support of the existing (internal) implementation.
> > Guaranteeing
> > >>> ODBC and JDBC behaviors should be satisfactory for the vast majority
> of
> > >>> situations. Anything beyond this needs to have a very public
> > cost/benefit
> > >>> tradeoff. In other words, please expose your thinking 100x more so
> that
> > >> we
> > >>> can all understand the ramifications of different strategies.
> > >>>
> > >>> thanks!
> > >>> Jacques
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Jacques Nadeau
> > >>> CTO and Co-Founder, Dremio
> > >>>
> > >>> On Tue, Apr 5, 2016 at 10:01 AM, Neeraja Rentachintala <
> > >>> nrentachint...@maprtech.com> wrote:
> > >>>
> > >>>> Sorry for coming back to this thread late.
> > >>>> I have some feedback on the compatibility aspects of 2.0.
> > >>>>
> > >>>> We are working with a variety of BI vendors to certify Drill and
> > >> provide
> > >>>> native connectors for Drill. Having native access from BI tools
> helps
> > >>> with
> > >>>> seamless experience for the users with performance and
> functionality.
> > >>> This
> > >>>> work is in progress and they are (and will be) working with 1.x
> > >> versions
> > >>> of
> > >>>> Drill as part of the development because thats what we have now.
> Some
> > >> of
> > >>>> these connectors will be available before 2.0 and some of them can
> > come
> > >>> in
> > >>>> post 2.0 as certification is a long process. We don't want to be in
> a
> > >>>> situation where the native connectors are just released by certain
> BI
> > >>>> vendor and the connector is immediately obsolete or doesn't work
> > >> because
> > >>> we
> > >>>> have 2.0 release out now.
> > >>>> So the general requirement should be that we maintain backward
> > >>>> compatibility with certain number of prior releases. This is very
> > >>> important
> > >>>> for the success of the project and adoption by eco system. I am
> happy
> > >> to
> > >>>> discuss further.
> > >>>>
> > >>>> -Neeraja
> > >>>>
> > >>>> On Tue, Apr 5, 2016 at 8:44 AM, Jacques Nadeau <jacq...@dremio.com>
> > >>> wrote:
> > >>>>
> > >>>>> I'm going to take this as lazy consensus. I'll create the branch.
> > >>>>>
> > >>>>> Once created, all merges to the master (1.x branch) should also go
> to
> > >>> the
> > >>>>> v2 branch unless we have a discussion here that they aren't
> > >> applicable.
> > >>>>> When committing, please make sure to commit to both locations.
> > >>>>>
> > >>>>> thanks,
> > >>>>> Jacques
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Jacques Nadeau
> > >>>>> CTO and Co-Founder, Dremio
> > >>>>>
> > >>>>> On Sat, Mar 26, 2016 at 7:26 PM, Jacques Nadeau <
> jacq...@dremio.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Re Compatibility:
> > >>>>>>
> > >>>>>> I actually don't even think 1.0 clients work with 1.6 server, do
> > >>> they?
> > >>>>>>
> > >>>>>> I would probably decrease the cross-compatibility requirement
> > >>> burden. A
> > >>>>>> nice goal would be cross compatibility across an extended series
> of
> > >>>>>> releases. However, given all the things we've learned in the last
> > >>> year,
> > >>>>> we
> > >>>>>> shouldn't try to maintain more legacy than is necessary. As such,
> I
> > >>>>> propose
> > >>>>>> that we consider the requirement of 2.0 to be:
> > >>>>>>
> > >>>>>> 1.lastX works with 2.firstX. (For example, if 1.8 is the last
> minor
> > >>>>>> release of the 1.x series, 1.8 would work with 2.0.)
> > >>>>>>
> > >>>>>> This simplifies testing (we don't have to worry about things like
> > >>> does
> > >>>>> 1.1
> > >>>>>> work with 2.3, etc) and gives people an upgrade path as they
> > >> desire.
> > >>>> This
> > >>>>>> also allows us to decide what pieces of the compatibility shim go
> > >> in
> > >>>> the
> > >>>>>> 2.0 server versus the 1.lastX client. (I actually lean towards
> > >>>> allowing a
> > >>>>>> full break between v1 and v2 server/client but understand that
> that
> > >>>> level
> > >>>>>> or coordination is hard in many organizations since analysts are
> > >>>> separate
> > >>>>>> from IT). Hopefully, what I'm proposing can be a good compromise
> > >>>> between
> > >>>>>> progress and deployment ease.
> > >>>>>>
> > >>>>>> Thoughts?
> > >>>>>>
> > >>>>>> Re: Branches/Dangers
> > >>>>>>
> > >>>>>> Good points on this Julian.
> > >>>>>>
> > >>>>>> How about this:
> > >>>>>>
> > >>>>>> - small fixes and enhancements PRs should be made against v1
> > >>>>>> - new feature PRs should be made against v2
> > >>>>>> - v2 should continue to always pass all precommit tests during its
> > >>> life
> > >>>>>> - v2 becomes master in two months
> > >>>>>>
> > >>>>>> I definitely don't want to create instability in the v2 branch.
> > >>>>>>
> > >>>>>> The other option I see is we can only do bug fix releases and
> > >> branch
> > >>>> the
> > >>>>>> current master into a maintenance branch and treat master as v2.
> > >>>>>>
> > >>>>>> Other ideas?
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Jacques Nadeau
> > >>>>>> CTO and Co-Founder, Dremio
> > >>>>>>
> > >>>>>> On Sat, Mar 26, 2016 at 6:07 PM, Julian Hyde <jh...@apache.org>
> > >>> wrote:
> > >>>>>>
> > >>>>>>> Do you plan to be doing significant development on both the v1
> and
> > >>> v2
> > >>>>>>> branches, and if so, for how long? I have been bitten badly by
> > >> that
> > >>>>> pattern
> > >>>>>>> in the past. Developers put lots of unrelated, destabilizing
> > >> changes
> > >>>>> into
> > >>>>>>> v2, it look longer than expected to stabilize v2, product
> > >> management
> > >>>>> lost
> > >>>>>>> confidence in v2 and shifted resources back to v1, and v2 never
> > >>> caught
> > >>>>> up
> > >>>>>>> with v1.
> > >>>>>>>
> > >>>>>>> One important question: Which branch will you ask people to
> target
> > >>> for
> > >>>>>>> pull requests? v1, v2 or both? If they submit to v2, and v2 is
> > >>> broken,
> > >>>>> how
> > >>>>>>> will you know whether the patches are good?
> > >>>>>>>
> > >>>>>>> My recommendation is to choose one of the following: (1) put a
> > >>> strict
> > >>>>>>> time limit of say 2 months after which v2 would become the master
> > >>>> branch
> > >>>>>>> (and v1 master would become a maintenance branch), or (2) make v2
> > >>>>> focused
> > >>>>>>> on a particular architectural feature; create multiple
> independent
> > >>>>> feature
> > >>>>>>> branches with breaking API changes if you need to.
> > >>>>>>>
> > >>>>>>> Julian
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On Mar 26, 2016, at 1:41 PM, Paul Rogers <prog...@maprtech.com>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi All,
> > >>>>>>>>
> > >>>>>>>> 2.0 is a good opportunity to enhance our ZK information. See
> > >>>>>>> DRILL-4543: Advertise Drill-bit ports, status, capabilities in
> > >>>>> ZooKeeper.
> > >>>>>>> This change will simplify YARN integration.
> > >>>>>>>>
> > >>>>>>>> This enhancement will change the “public API” in ZK. To Parth’s
> > >>>> point,
> > >>>>>>> we can do so in a way that old clients work - as long as a
> > >> Drill-bit
> > >>>>> uses
> > >>>>>>> default ports.
> > >>>>>>>>
> > >>>>>>>> I’ve marked this JIRA as a candidate for 2.0.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>> - Paul
> > >>>>>>>>
> > >>>>>>>>> On Mar 24, 2016, at 4:11 PM, Parth Chandra <par...@apache.org>
> > >>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> What's our proposal for backward compatibility between 1.x and
> > >>> 2.x?
> > >>>>>>>>> My thoughts:
> > >>>>>>>>> Optional  -  Allow a mixture of 1.x and 2.x drillbits in a
> > >>> cluster.
> > >>>>>>>>> Required - 1.x clients should be able to talk to 2.x drillbits.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Mar 24, 2016 at 8:55 AM, Jacques Nadeau <
> > >>>> jacq...@dremio.com>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> There are some changes that either have reviews pending or are
> > >>> in
> > >>>>>>> progress
> > >>>>>>>>>> that would require breaking changes to Drill core.
> > >>>>>>>>>>
> > >>>>>>>>>> Examples Include:
> > >>>>>>>>>> DRILL-4455 (arrow integration)
> > >>>>>>>>>> DRILL-4417 (jdbc/odbc/rpc changes)
> > >>>>>>>>>> DRILL-4534 (improve null performance)
> > >>>>>>>>>>
> > >>>>>>>>>> I've created a new 2.0.0 release version in JIRA and moved
> > >> these
> > >>>>>>> tasks to
> > >>>>>>>>>> that umbrella.
> > >>>>>>>>>>
> > >>>>>>>>>> I'd like to propose a new v2 release branch where we can start
> > >>>>>>>>>> incorporating these changes without disrupting v1 stability
> > >> and
> > >>>>>>>>>> compatibility.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Jacques Nadeau
> > >>>>>>>>>> CTO and Co-Founder, Dremio
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Reply via email to