Hi Mike, Why does Dr. Elephant make sense as a separate project instead of > contributing to Hadoop directly? >
Here are a couple reasons why I think Dr. Elephant is more likely to succeed as a separate project: * Dr. Elephant supports Hadoop *and* Spark, and may support other execution layers in the future. If we make Dr. Elephant a part of Hadoop I expect that it will discourage contributions from people who are interested mainly in Spark support, and vice versa. * If Dr. Elephant is added to Hadoop it will be necessary for the Hadoop project to declare a dependency on Spark. I doubt this change will get approved. * We don't want to tie Dr. Elephant to a specific version of Hadoop or Spark, or tie the Dr. Elephant release cycle to the Hadoop or Spark release cycles. * None of the current Dr. Elephant committers are Hadoop committers, and I doubt that the Hadoop PMC is going to give them a commit bit just to work on Dr. Elephant. As a result the existing committers would be effectively forfeiting their right to continue maintaining their own project. I think this is one of the reasons why many Hadoop contrib projects are poorly maintained. > What is the relationship between Dr. Elephant and the (now seemingly > defunct) Hadoop Vaidya? > Vaidya was a command line tool for tuning Hadoop jobs. Dr. Elephant is an always-on service for tuning Hadoop and Spark jobs. We were unaware of Vaidya when we started working on Dr. Elephant. - Carl