Really great to see an incubation proposal for HTrace. If you need another mentor, please consider me.
I don't think you need to list "HTrace is not the primary focus of any of the current list of contributors" as a risk. One can say that about many (perhaps the majority) of contributors to Apache projects. We would hope the incubation process develops a healthy community that sustains a level of contribution that keeps the project moving forward, as we would hope for all incubation candidates. On Fri, Oct 31, 2014 at 4:06 PM, Roman Shaposhnik <r...@apache.org> wrote: > Hi! > > I would like to propose HTrace to be consider for > Apache Incubator. The proposal is attached and > is also available on the wiki: > https://wiki.apache.org/incubator/HTraceProposal > > Please let me know what do you guys think and also > don't hesitate to massage the proposal on the wiki > based on the feedback from this thread. > > Thanks, > Roman. > > == Abstract == > HTrace is a tracing framework intended for use with distributed > systems written in java. > > == Proposal == > HTrace is an aid for understanding system behavior and for reasoning > about performance > issues in distributed systems. HTrace is primarily a low impedance > library that a java > distributed system can incorporate to generate ‘breadcrumbs’ or > ‘traces’ along the path > of execution, even as it crosses processes and machines. HTrace also > includes various > tools and glue for collecting, processing and ‘visualizing’ captured > execution traces > for analysis ex post facto of where time was spent and what resources > were consumed. > > == Background == > Distributed systems are made up of multiple software components > running on multiple > computers connected by networks. Debugging or profiling operations run > over non-trivial > distributed systems -- figuring execution paths and what services, > machines, and > libraries participated in the processing of a request -- can be involved. > > == Rationale == > Rather than have each distributed system build its own custom > ‘tracing’ libraries, > ideally all would use a single project that provides necessary > primitives and saves > each project building its own visualizations and processing tools anew. > > Google described “...[a] large-scale distributed systems tracing > infrastructure” > in Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. The > paper > tells a compelling story of what is possible when disparate systems > standardize > on a single tracing library and cooperate, ‘passing the baton’, filling out > trace context as executions cross systems. > > HTrace aims to provide a rough equivalent in open source of the described > core > Dapper tools and library. As it is adopted by more projects, there will > be a > ‘network effect’ as HTrace will provide a more comprehensive view of > activity > on the cluster. For example, as HDFS gets HTrace support, we can connect > this > with the HTrace support in HBase to follow HBase requests as they enter > HDFS. > > Given the success of HTrace depends on its being integrated by many > projects, > HTrace should be perceived as unhampered, free of any commercial, > political, > or legal ‘taint’. Being an Apache project would help in this regard. > > == Initial Goals == > HTrace is a small project of narrow scope but with a grand vision: > * Move the HTrace source and repository to Apache, a vendor-neutral > location. Currently HTrace resides at a Cloudera-hosted repository. > * Add past contributors as committers and institute Apache governance. > * Evangelize and encourage HTrace diffusion. Initially we will > continue a focus on the Hadoop space since that is where most of the > initial contributors work and it is where HTrace has been initially > deployed. > * Building out the standalone visualization tool that ships with HTrace. > * Build more community and add more committers > > == Current Status == > Currently HTrace has a viable Java trace library that can be interpolated > to create ‘traces’. The work that needs to be done on this library is > mostly > bug fixes, ease-of-use improvements, and performance tweaks. In the > future, > we may add libraries for other languages besides Java. > > HTrace has means of dumping traces to the filesystem, Twitters’ Zipkin > (a tracing > sink and visualization system developed by Twitter > https://github.com/twitter/zipkin), > or Apache HBase. Executions can be viewed either in Zipkin or in pygraph > (https://code.google.com/p/python-graph/). > > Since the initial sprint in the summer of 2012 which saw HTrace patches > proposed > for Apache HDFS and committed to Apache HBase, development has been > sporadic; > mostly a single developer or two adding a feature or bug fixing. HTrace is > currently undergoing a new “spurt” of development with the effort to get > HTrace > added to Apache HDFS revived and a new standalone viewing facility being > added > in to HTrace itself. > > HTrace has been integrated by Apache Phoenix. > > > === Meritocracy === > HTrace, up to this, has been run by Apache committers and PMC members. > We want to > build out a diverse developer and user community and run the HTrace > project in > the Apache way. Users and new contributors will be treated with respect > and > welcomed; they will earn merit in the project by tendering quality patches > and support that move the project forward. Those with a proven support and > quality patch track record will be encouraged to become committers. > > === Community === > There are just a few developers involved at the moment. If our project > is accepted > by incubator, building community would be a primary initial goal. > > === Core Developers === > > Core developers include Apache members and members of the Hadoop and > HBase PMCs. > Of those listed, all have contributed to HTrace. Half are from Cloudera. > The remainder are Hortonworks, NTTData, Google, and Facebook employees. > > === Alignment === > HTrace has been integrated into Apache HBase and Apache Phoenix. > Integration > into Apache HDFS is currently being worked on. Approaching the Apache YARN > project would be a likely next integration. > > > == Known Risks == > As noted above, development has been sporadic up to this. It may continue > so. > > HTrace is not the primary focus of any of the current list of contributors. > It is for all a side effort. HTrace may lack sufficient impetus with such > a state of affairs. > > For HTrace to tell a compelling story, it needs to be taken up by > significant > projects that make up a traced distributed system. For example, say YARN > and > HBase take on HTrace but HDFS does not, then the HDFS portions of an > end-to-end > operation will render opaque compromising our being able to tell a good > story > around an execution. Because the picture painted has gaps, HTrace may be > left > aside as ineffective. > > === Orphaned products === > The proposers have a vested interest in making HTrace succeed, driving its > development and its insertion into projects we all work on. Its dispersion > will shine light on difficult to understand interactions amongst the > various > systems we all work on. A working, integrated HTrace will add a useful > debugging mechanism to the Apache projects we all work on. > > > === Inexperience with Open Source === > The majority of the proposers here have day jobs that has them working near > full-time on (Apache) open source projects. A few of us have helped carry > other projects through incubator. HTrace to date has been developed as > an open source project. > > === Homogenous Developers === > The initial group of committers is small but already we have a healthy > diversity of participating companies. We are bay-area challenged but > a Japanese contributor makes for a good counter balance. > > === Reliance on Salaried Developers === > Most of the contributors are paid to work in the Hadoop ecosystem. > While we might wander from our current employers, we probably won’t > go far from the Hadoop tree. Whoever the Hadoop employer, it is > plain a successful HTrace project is in everyone’s interest. > At least one of the developers has already changed employers but > his interest in seeing HTrace succeed prevails. > > === Relationships with Other Apache Products === > For HTrace to succeed, it is critical we build good relations with > other distributed systems projects. We intend to initially build > on relations we already have in place, mostly in the Hadoop space. > > The HTrace project has been incorporated by Apache HBase and > Apache Phoenix. It is currently being actively integrated into > Apache HDFS. > > We do not know of any equivalent or near-equivalent project > in the Apache space. > > The Dapper paper notes precedent, in particular, the Berkeley > Rad Lab X-Trace project. > > ==== How HTrace relates to Zipkin ==== > Zipkin is an Apache Licensed project from Twitter. It is a complete > tracing tool with trace collectors, trace viewers and tools to help > you generate traces. It is written in Scala. If your project is > not Scala or if it is Java and you cannot afford a Scala dependency, > at a minimum, you need an alternate means of generating traces. > HTrace provides this facility for Java as well as bridging tools > to feed traces to Zipkin for query and display. > > The projects complement each other. > > === A Excessive Fascination with the Apache Brand === > While we intend to leverage the Apache ‘branding’ when talking to other > projects as testament of our project’s ‘neutrality’, we have no plans > for making use of Apache brand in press releases nor posting billboards > advertising acceptance of HTrace into Apache Incubator. > > > == Documentation == > See [[http://htrace.org|htrace.org]] for the current state of the HTrace > project and documentation. > > How to enable tracing in > [[http://hbase.apache.org/book/tracing.html|HBase using HTrace]] > Elliott Clark on > [[http://files.meetup.com/1350427/HBase%20Meetup%20-%20Zipkin.pptx|tracing > in HBase]] > > == Initial Source == > Jonathan Leavitt and Todd Lipcon built the first versions of HTrace in the > summer of 2012. Jonathan was Todd’s summer intern at Cloudera. > > > == Source and Intellectual Property Submission Plan == > We know of no legal encumberments in the way of transfer of source to > Apache. > > == External Dependencies == > HTrace includes third party libs. These include guava, jetty, junit, > protobuf, > hbase, and thrift. All dependencies are Apache licensed or licenses that > are > palatable: e.g. junit is EPL (Eclipse Public License v1.0) and > ProtoBufs are BSD licensed. > > Cryptography > N/A > > == Required Resources == > > === Mailing lists === > * priv...@htrace.incubator.apache.org (moderated subscriptions) > * comm...@htrace.incubator.apache.org > * d...@htrace.incubator.apache.org > * iss...@htrace.incubator.apache.org > * u...@htrace.incubator.apache.org > > === Git Repository === > https://git-wip-us.apache.org/repos/asf/incubator-htrace.git > > === Issue Tracking === > JIRA HTrace (HTRACE) > > === Other Resources === > Means of setting up regular builds for htrace on builds.apache.org > > == Initial Committers == > * Colin McCabe (cmcc...@apache.org) > * Elliott Clark (ecl...@apache.org) > * Jonathan Leavitt (jon.s.leav...@gmail.com) -- CLA being submitted > * Masatake Iwasaki (iwasak...@gmail.com) -- CLA being submitted > * Michael Stack (st...@apache.org) > * Nick Dimiduk (ndimi...@apache.org) > * Todd Lipcon (t...@apache.org) > > > == Affiliations == > * Colin McCabe - Cloudera > * Elliott Clark - Facebook > * Jonathan Leavitt - Google > * Masatake Iwasaki - NTTData > * Michael Stack - Cloudera > * Nick Dimiduk - Hortonworks > * Todd Lipcon - Cloudera > > == Sponsors == > > === Champion === > Roman Shaposhnik > > === Nominated Mentors === > * Michael Stack - Apache Member > * Todd Lipcon - Apache Member > > We will be soliciting more mentors as part of the proposal process. > > === Sponsoring Entity === > We would like to propose Apache incubator to sponsor this project. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)