More things we should get ahead of for Accumulo 2.0.0: distributed tracing.

Right now we have an awkward situation wrt HTrace support. We're using and 
shipping htrace 3.1. It works okay for our internal uses, afaict.

Hadoop 2.6 ships and uses HTrace 3.0. I believe this does not work with 3.1.

Hadoop 2.7 ships and uses HTrace 3.1, which means we can trace things down 
through the hdfs layer, though I've never managed to get anything useful out of 
trying to do so.

Hadoop 2.8+ ships and uses HTrace 4.0, which isn't compatible with 3.1.

Hadoop 2.9+ and 3.0+ ship and use HTrace 4.1, it doesn't work with 3.1 and it's 
not clear to me if it works with 4.0.

I *think* this means that we can only do tracing down into a single minor 
release line of HDFS.

Additionally, as of their January podling report[1] it looks like HTrace is 
about ready to shut down as a project. In theory it could continue on github or 
something after exiting the ASF incubator, but I don't see why we'd expect an 
increase in activity; the project's been pretty idle for at least a year.

I really don't want to go back to maintaining cloudtrace. So what should we do? 
Zipkin has some decent uptake[2]; they're still making regular releases. Maybe 
just adopt OpenTracing[3] as a way to hedge against having to do his again?

YCSB is the Accumulo downstream with tracing that I'm most familiar with. It 
relies on HTrace 4.1. I'm probably going to push that community to abandon 
HTrace soon. (I was the one that pushed it to adopt HTrace in the first place). 
I'd like to at least get tracing from YCSB through Accumulo in the process.

[1]: https://s.apache.org/uDHN
[2]: https://zipkin.io/
[3]: http://opentracing.io/documentation/

Reply via email to