Yeah, website or wiki. Thanks Todd.
J-D On Thu, Feb 11, 2016 at 8:07 PM, Todd Lipcon <[email protected]> wrote: > Hey JD, > > +1 for the proposal, and thanks for volunteering to RM. Assuming that > we reach some (lazy) consensus on this plan, it seems like the kind of > thing we'd want to publish on our website somewhere as well (including > links to relevant JIRAs). > > -Todd > > On Thu, Feb 11, 2016 at 3:41 PM, Jean-Daniel Cryans <[email protected]> > wrote: > > Hi Kudu devs, > > > > While we're in the process of releasing 0.7.0, I'd like to start > discussing > > a roadmap to 1.0 and a release cadence to get us there. What I’m > proposing > > here is my own aggregate of discussions with my teammates, community > > members, and potential users. I’m also volunteering to be the RM for 1.0 > > and all the releases that lead up to it. > > > > Starting with “What’s 1.0 exactly?”, and having been on other projects > and > > knowing the pain of calling things "1.0", I strongly believe that we > should > > ship it as soon as possible. There's as many definitions of 1.0 as people > > on this list, but to me it boils down to being a version of the software > > that: > > > > - Enough people are willing to deploy in production. > > > > - Does at least one thing really well. > > > > Xiaomi is already in production with Kudu but my assessment is that the > > following are required pieces to "get enough people": > > > > Replay cache <https://issues.cloudera.org/browse/KUDU-568>: We need to > be > > able to enforce exactly-once semantics. Right now if an Insert times out > > for any reason it runs the chance of hitting a ALREADY_PRESENT row error. > > It's hard to build something reliable on top of that. > > > > Finish multi-master <https://issues.cloudera.org/browse/KUDU-683>: Kudu > can > > run with multiple masters but there's enough rough edges that it's not > > recommended, leaving us with a SPOF. This is a blocker for many people, > > also SPOF sounds scary. > > > > Hardened DDL operations: We have a laundry list of known problems: > > KUDU-915, KUDU-969, KUDU-99, KUDU-180, KUDU-791, KUDU-887. Altering a > table > > shouldn't corrupt data. > > > > Backup support: We need an easy way of getting data in and out of Kudu, I > > don't think it has to be fancy. The equivalent of HBase's CopyTable might > > be enough. > > > > In order to really well support some use cases on Kudu, I think we need > at > > least one of the following: > > > > Client-side support for flexible partitioning: Especially with > > hash-partitioning, both C++ and Java clients need to do a better job at > > partition pruning. We also need some high-level APIs for query engines > like > > Impala, Drill, SparkSQL, and Hive to learn about the table layouts and > > better plan local scans. > > > > Time series features: Two features are missing which make it difficult to > > use Kudu for TS data. First, Kudu has no way to drop a complete range, so > > users who want to remove past data and reclaim resources are unable to do > > so. Second, Kudu does not have an efficient strategy for partitioning by > > time. Tablets must be created up-front, so operators are forced to create > > tablets covering time ranges in the far future. This means we need to be > > able to create pre-split tables that won't cover the whole key range and > be > > able to add new/drop tablets. > > > > Merge operations <https://issues.cloudera.org/browse/KUDU-1002>: We've > > already seen people trying to get around the lack of upserts by doing > > Insert then catch row errors (which is really bad without a replay cache) > > or doing reads first. > > > > Bulk load/basic transactions: Without going into multi-row low-latency > > transactions, it seems we can satisfy a lot of use cases by first staging > > inserts in the tablets and then making them visible atomically. Doing it > > that way opens up new ways of optimizing the ingest path by running, for > > example, the row duplication checks in batch. > > > > Full support for snapshot consistency > > <https://issues.cloudera.org/browse/KUDU-430>: this is basically about > > implementing the vision laid out here: > > http://getkudu.io/docs/transaction_semantics.html > > > > This list is my perception of what the most needed features are. It is > open > > to additions for folks with different priorities. If we end up doing > > something that’s not on this list then we would still meet the 1.0 > > definition of “doing at least one thing really well”. > > > > > > For the cadence, I think we should aim for a release every two months so > > that we continually deliver new features, improvements, and bug fixes > while > > keeping a short feedback loop. We'd stay in "beta" mode during that time > > and would not do point releases unless we release something that's just > not > > usable. > > > > Finally, let's shoot for a Fall timeframe for Kudu 1.0 which means we’ll > > have two more intermediate releases. > > > > What does the dev list think? > > > > J-D > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
