Hi JD, thanks for sending the proposal. +1 to releasing a Kudu 1.0 in the
Fall (sounds like this means around August or so), with periodic 0.x
releases in between.

Mike

On Fri, Feb 12, 2016 at 1:41 AM, Jean-Daniel Cryans <[email protected]>
wrote:

> Hi Kudu devs,
>
> While we're in the process of releasing 0.7.0, I'd like to start discussing
> a roadmap to 1.0 and a release cadence to get us there. What I’m proposing
> here is my own aggregate of discussions with my teammates, community
> members, and potential users. I’m also volunteering to be the RM for 1.0
> and all the releases that lead up to it.
>
> Starting with “What’s 1.0 exactly?”, and having been on other projects and
> knowing the pain of calling things "1.0", I strongly believe that we should
> ship it as soon as possible. There's as many definitions of 1.0 as people
> on this list, but to me it boils down to being a version of the software
> that:
>
> - Enough people are willing to deploy in production.
>
> - Does at least one thing really well.
>
> Xiaomi is already in production with Kudu but my assessment is that the
> following are required pieces to "get enough people":
>
> Replay cache <https://issues.cloudera.org/browse/KUDU-568>: We need to be
> able to enforce exactly-once semantics. Right now if an Insert times out
> for any reason it runs the chance of hitting a ALREADY_PRESENT row error.
> It's hard to build something reliable on top of that.
>
> Finish multi-master <https://issues.cloudera.org/browse/KUDU-683>: Kudu
> can
> run with multiple masters but there's enough rough edges that it's not
> recommended, leaving us with a SPOF. This is a blocker for many people,
> also SPOF sounds scary.
>
> Hardened DDL operations: We have a laundry list of known problems:
> KUDU-915, KUDU-969, KUDU-99, KUDU-180, KUDU-791, KUDU-887. Altering a table
> shouldn't corrupt data.
>
> Backup support: We need an easy way of getting data in and out of Kudu, I
> don't think it has to be fancy. The equivalent of HBase's CopyTable might
> be enough.
>
> In order to really well support some use cases on Kudu, I think we need at
> least one of the following:
>
> Client-side support for flexible partitioning: Especially with
> hash-partitioning, both C++ and Java clients need to do a better job at
> partition pruning. We also need some high-level APIs for query engines like
> Impala, Drill, SparkSQL, and Hive to learn about the table layouts and
> better plan local scans.
>
> Time series features: Two features are missing which make it difficult to
> use Kudu for TS data. First, Kudu has no way to drop a complete range, so
> users who want to remove past data and reclaim resources are unable to do
> so. Second, Kudu does not have an efficient strategy for partitioning by
> time. Tablets must be created up-front, so operators are forced to create
> tablets covering time ranges in the far future. This means we need to be
> able to create pre-split tables that won't cover the whole key range and be
> able to add new/drop tablets.
>
> Merge operations <https://issues.cloudera.org/browse/KUDU-1002>: We've
> already seen people trying to get around the lack of upserts by doing
> Insert then catch row errors (which is really bad without a replay cache)
> or doing reads first.
>
> Bulk load/basic transactions: Without going into multi-row low-latency
> transactions, it seems we can satisfy a lot of use cases by first staging
> inserts in the tablets and then making them visible atomically. Doing it
> that way opens up new ways of optimizing the ingest path by running, for
> example, the row duplication checks in batch.
>
> Full support for snapshot consistency
> <https://issues.cloudera.org/browse/KUDU-430>: this is basically about
> implementing the vision laid out here:
> http://getkudu.io/docs/transaction_semantics.html
>
> This list is my perception of what the most needed features are. It is open
> to additions for folks with different priorities. If we end up doing
> something that’s not on this list then we would still meet the 1.0
> definition of “doing at least one thing really well”.
>
>
> For the cadence, I think we should aim for a release every two months so
> that we continually deliver new features, improvements, and bug fixes while
> keeping a short feedback loop. We'd stay in "beta" mode during that time
> and would not do point releases unless we release something that's just not
> usable.
>
> Finally, let's shoot for a Fall timeframe for Kudu 1.0 which means we’ll
> have two more intermediate releases.
>
> What does the dev list think?
>
> J-D
>

Reply via email to