Hi JD, thanks for sending the proposal. +1 to releasing a Kudu 1.0 in the Fall (sounds like this means around August or so), with periodic 0.x releases in between.
Mike On Fri, Feb 12, 2016 at 1:41 AM, Jean-Daniel Cryans <[email protected]> wrote: > Hi Kudu devs, > > While we're in the process of releasing 0.7.0, I'd like to start discussing > a roadmap to 1.0 and a release cadence to get us there. What I’m proposing > here is my own aggregate of discussions with my teammates, community > members, and potential users. I’m also volunteering to be the RM for 1.0 > and all the releases that lead up to it. > > Starting with “What’s 1.0 exactly?”, and having been on other projects and > knowing the pain of calling things "1.0", I strongly believe that we should > ship it as soon as possible. There's as many definitions of 1.0 as people > on this list, but to me it boils down to being a version of the software > that: > > - Enough people are willing to deploy in production. > > - Does at least one thing really well. > > Xiaomi is already in production with Kudu but my assessment is that the > following are required pieces to "get enough people": > > Replay cache <https://issues.cloudera.org/browse/KUDU-568>: We need to be > able to enforce exactly-once semantics. Right now if an Insert times out > for any reason it runs the chance of hitting a ALREADY_PRESENT row error. > It's hard to build something reliable on top of that. > > Finish multi-master <https://issues.cloudera.org/browse/KUDU-683>: Kudu > can > run with multiple masters but there's enough rough edges that it's not > recommended, leaving us with a SPOF. This is a blocker for many people, > also SPOF sounds scary. > > Hardened DDL operations: We have a laundry list of known problems: > KUDU-915, KUDU-969, KUDU-99, KUDU-180, KUDU-791, KUDU-887. Altering a table > shouldn't corrupt data. > > Backup support: We need an easy way of getting data in and out of Kudu, I > don't think it has to be fancy. The equivalent of HBase's CopyTable might > be enough. > > In order to really well support some use cases on Kudu, I think we need at > least one of the following: > > Client-side support for flexible partitioning: Especially with > hash-partitioning, both C++ and Java clients need to do a better job at > partition pruning. We also need some high-level APIs for query engines like > Impala, Drill, SparkSQL, and Hive to learn about the table layouts and > better plan local scans. > > Time series features: Two features are missing which make it difficult to > use Kudu for TS data. First, Kudu has no way to drop a complete range, so > users who want to remove past data and reclaim resources are unable to do > so. Second, Kudu does not have an efficient strategy for partitioning by > time. Tablets must be created up-front, so operators are forced to create > tablets covering time ranges in the far future. This means we need to be > able to create pre-split tables that won't cover the whole key range and be > able to add new/drop tablets. > > Merge operations <https://issues.cloudera.org/browse/KUDU-1002>: We've > already seen people trying to get around the lack of upserts by doing > Insert then catch row errors (which is really bad without a replay cache) > or doing reads first. > > Bulk load/basic transactions: Without going into multi-row low-latency > transactions, it seems we can satisfy a lot of use cases by first staging > inserts in the tablets and then making them visible atomically. Doing it > that way opens up new ways of optimizing the ingest path by running, for > example, the row duplication checks in batch. > > Full support for snapshot consistency > <https://issues.cloudera.org/browse/KUDU-430>: this is basically about > implementing the vision laid out here: > http://getkudu.io/docs/transaction_semantics.html > > This list is my perception of what the most needed features are. It is open > to additions for folks with different priorities. If we end up doing > something that’s not on this list then we would still meet the 1.0 > definition of “doing at least one thing really well”. > > > For the cadence, I think we should aim for a release every two months so > that we continually deliver new features, improvements, and bug fixes while > keeping a short feedback loop. We'd stay in "beta" mode during that time > and would not do point releases unless we release something that's just not > usable. > > Finally, let's shoot for a Fall timeframe for Kudu 1.0 which means we’ll > have two more intermediate releases. > > What does the dev list think? > > J-D >
