Re: Roadmap to Kudu 1.0 and release cadence

Jean-Daniel Cryans Fri, 12 Feb 2016 08:30:33 -0800

Yeah, website or wiki.

Thanks Todd.


J-D

On Thu, Feb 11, 2016 at 8:07 PM, Todd Lipcon <[email protected]> wrote:

> Hey JD,
>
> +1 for the proposal, and thanks for volunteering to RM. Assuming that
> we reach some (lazy) consensus on this plan, it seems like the kind of
> thing we'd want to publish on our website somewhere as well (including
> links to relevant JIRAs).
>
> -Todd
>
> On Thu, Feb 11, 2016 at 3:41 PM, Jean-Daniel Cryans <[email protected]>
> wrote:
> > Hi Kudu devs,
> >
> > While we're in the process of releasing 0.7.0, I'd like to start
> discussing
> > a roadmap to 1.0 and a release cadence to get us there. What I’m
> proposing
> > here is my own aggregate of discussions with my teammates, community
> > members, and potential users. I’m also volunteering to be the RM for 1.0
> > and all the releases that lead up to it.
> >
> > Starting with “What’s 1.0 exactly?”, and having been on other projects
> and
> > knowing the pain of calling things "1.0", I strongly believe that we
> should
> > ship it as soon as possible. There's as many definitions of 1.0 as people
> > on this list, but to me it boils down to being a version of the software
> > that:
> >
> > - Enough people are willing to deploy in production.
> >
> > - Does at least one thing really well.
> >
> > Xiaomi is already in production with Kudu but my assessment is that the
> > following are required pieces to "get enough people":
> >
> > Replay cache <https://issues.cloudera.org/browse/KUDU-568>: We need to
> be
> > able to enforce exactly-once semantics. Right now if an Insert times out
> > for any reason it runs the chance of hitting a ALREADY_PRESENT row error.
> > It's hard to build something reliable on top of that.
> >
> > Finish multi-master <https://issues.cloudera.org/browse/KUDU-683>: Kudu
> can
> > run with multiple masters but there's enough rough edges that it's not
> > recommended, leaving us with a SPOF. This is a blocker for many people,
> > also SPOF sounds scary.
> >
> > Hardened DDL operations: We have a laundry list of known problems:
> > KUDU-915, KUDU-969, KUDU-99, KUDU-180, KUDU-791, KUDU-887. Altering a
> table
> > shouldn't corrupt data.
> >
> > Backup support: We need an easy way of getting data in and out of Kudu, I
> > don't think it has to be fancy. The equivalent of HBase's CopyTable might
> > be enough.
> >
> > In order to really well support some use cases on Kudu, I think we need
> at
> > least one of the following:
> >
> > Client-side support for flexible partitioning: Especially with
> > hash-partitioning, both C++ and Java clients need to do a better job at
> > partition pruning. We also need some high-level APIs for query engines
> like
> > Impala, Drill, SparkSQL, and Hive to learn about the table layouts and
> > better plan local scans.
> >
> > Time series features: Two features are missing which make it difficult to
> > use Kudu for TS data. First, Kudu has no way to drop a complete range, so
> > users who want to remove past data and reclaim resources are unable to do
> > so. Second, Kudu does not have an efficient strategy for partitioning by
> > time. Tablets must be created up-front, so operators are forced to create
> > tablets covering time ranges in the far future. This means we need to be
> > able to create pre-split tables that won't cover the whole key range and
> be
> > able to add new/drop tablets.
> >
> > Merge operations <https://issues.cloudera.org/browse/KUDU-1002>: We've
> > already seen people trying to get around the lack of upserts by doing
> > Insert then catch row errors (which is really bad without a replay cache)
> > or doing reads first.
> >
> > Bulk load/basic transactions: Without going into multi-row low-latency
> > transactions, it seems we can satisfy a lot of use cases by first staging
> > inserts in the tablets and then making them visible atomically. Doing it
> > that way opens up new ways of optimizing the ingest path by running, for
> > example, the row duplication checks in batch.
> >
> > Full support for snapshot consistency
> > <https://issues.cloudera.org/browse/KUDU-430>: this is basically about
> > implementing the vision laid out here:
> > http://getkudu.io/docs/transaction_semantics.html
> >
> > This list is my perception of what the most needed features are. It is
> open
> > to additions for folks with different priorities. If we end up doing
> > something that’s not on this list then we would still meet the 1.0
> > definition of “doing at least one thing really well”.
> >
> >
> > For the cadence, I think we should aim for a release every two months so
> > that we continually deliver new features, improvements, and bug fixes
> while
> > keeping a short feedback loop. We'd stay in "beta" mode during that time
> > and would not do point releases unless we release something that's just
> not
> > usable.
> >
> > Finally, let's shoot for a Fall timeframe for Kudu 1.0 which means we’ll
> > have two more intermediate releases.
> >
> > What does the dev list think?
> >
> > J-D
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Roadmap to Kudu 1.0 and release cadence

Reply via email to