Josh, In theory, I'd be most happy if it were possible to have both sides of the accumulo sandwich (one side being "the application", the other being HDFS/zookeeper/etc.) instrumented and tracing in the same way to get a comprehensive view. I just don't know how realistic that is.
On Wed, Feb 28, 2018 at 10:02 AM, Josh Elser <els...@apache.org> wrote: > Thanks for letting us know, Tony. I can totally understand how the > server-side tracing (and collapsing it can do) would be super-helpful in > figuring out what's happening. > > I read that as one reason for simply not trying to get HDFS and Accumulo > re-sync'ed. I think we have value in leaving what we presently have in > Accumulo now over removing it completely. > > > On 2/27/18 8:50 PM, Tony Kurc wrote: > >> Josh, >> It was exclusively the first - using the traces in the server-side code. >> The most common case is "I have a scan which is much slower than >> expected", >> and couldn't figure out why. I'm trying to think of alternative approaches >> to using the traces, and honestly, doing a bunch of log aggregation is the >> alternative I'd have to fall back to, and in some cases recompiling parts >> of accumulo with new log messages in place. >> >> >> Tony >> >> On Tue, Feb 27, 2018 at 7:18 PM, Josh Elser <els...@apache.org> wrote: >> >> Oh, that's a pleasant surprise to hear, actually. >>> >>> Anything you can share with the class, Tony? Would love to hear (even if >>> brief) how it was used and benefited you. >>> >>> Specifically, I'm curious if... >>> >>> * You looked at traces from our server-side instrumented code >>> * You instrumented your own code outside of Accumulo and used Accumulo as >>> the backing store >>> * You instrumented code inside/outside Accumulo and benefited from the >>> server-side instrumentation (e.g. your code's spans collapsing with the >>> server's spans) >>> >>> >>> On 2/27/18 6:52 PM, Tony Kurc wrote: >>> >>> I'd personally be disappointed to see it removed. There is a bit of a >>>> learning curve and startup cost to use it now, but when diagnosing major >>>> challenges, it has been an invaluable capability. >>>> >>>> On Feb 27, 2018 3:15 PM, "Josh Elser" <els...@apache.org> wrote: >>>> >>>> Wow... that's, erm, quite the paper. Nothing like taking some pot-shots >>>> at >>>> another software project and quoting folks out of context. >>>> >>>> Does it help to break down the problem some more? >>>> >>>> * Is Accumulo getting benefit from tracing its library? >>>> * Is Accumulo getting benefit from tracing context including HDFS calls? >>>> >>>> I feel like it is a nice tool to have in your toolbelt (having used it >>>> successfully in the past), but I wonder if it's the most effective thing >>>> to >>>> keep inside of Accumulo. Specifically, would it be better to just pull >>>> this >>>> out of Accumulo outright? >>>> >>>> I don't think I have an opinion yet. >>>> >>>> >>>> On 2/27/18 1:08 PM, Ed Coleman wrote: >>>> >>>> For general discussion - Facebook recently (Oct 28, 2017) published a >>>> >>>>> paper on tracing: Canopy: An End-to-End Performance Tracing and >>>>> Analysis >>>>> System (https://research.fb.com/publications/canopy-end-to-end- >>>>> performance-tracing-at-scale/) >>>>> >>>>> As a bonus, they referenced Accumulo and HTrace in section 2.2 >>>>> >>>>> "Mismatched models affected compatibility between mixed system >>>>> versions; >>>>> e.g. Accumulo and Hadoop were impacted by the “continued lack of >>>>> concern >>>>> in >>>>> the HTrace project around tracing during upgrades” >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Tony Kurc [mailto:tk...@apache.org] >>>>> Sent: Tuesday, February 27, 2018 12:57 PM >>>>> To: dev@accumulo.apache.org >>>>> Subject: Re: [DISCUSS] tracing framework updates >>>>> >>>>> I have some experience with opentracing, and it definitely seems >>>>> promising, however, potentially promising in the same way htrace was... >>>>> That being said, I did a cursory thought exercise of what it would take >>>>> to >>>>> do a swap of the current tracing in accumulo to opentracing, and I >>>>> didn't >>>>> come across any hard problems, meaning it could be a fairly >>>>> straightforward >>>>> refactor. I was hoping to explore the community a bit more at some >>>>> upcoming >>>>> conferences >>>>> >>>>> On Feb 27, 2018 11:59 AM, "Sean Busbey" <bus...@apache.org> wrote: >>>>> >>>>> >>>>> >>>>> On 2018/02/27 16:39:02, Christopher <ctubb...@apache.org> wrote: >>>>>> >>>>>> I didn't realize HTrace was struggling in incubation. Maybe some of >>>>>> >>>>>>> us >>>>>>> >>>>>>> can >>>>>>> >>>>>> >>>>>> start participating? The project did start within Accumulo, after all. >>>>>> >>>>>>> >>>>>>> What >>>>>>> >>>>>> >>>>>> does it need? I also wouldn't want to go back to maintaining >>>>>> cloudtrace. >>>>>> >>>>>>> >>>>>>> >>>>>>> I suspect it's too late for HTrace. The last commit to the main >>>>>>> >>>>>> development branch was May 2017. They had a decent run of activity in >>>>>> 2015 and an almost-resurgence in 2016, but they never really got >>>>>> enough community traction to survive the normal ebb and flow of >>>>>> contributor involvement. >>>>>> >>>>>> They need the things any project needs to be sustainable: regular >>>>>> release cadences, a responsive contribution process, and folks to do >>>>>> the long slog of building interest via e.g. production adoption. >>>>>> >>>>>> I'm unfamiliar with OpenTracing, but it was my understanding that >>>>>> >>>>>> Zipkin was more of a tracing sink, than an instrumentation API. >>>>>>> HTrace is >>>>>>> >>>>>>> actually >>>>>>> >>>>>> >>>>>> listed as an instrumentation library for Zipkin (among others). >>>>>> >>>>>>> >>>>>>> >>>>>>> I think the key is that for a instrumentation library to get adoption >>>>>>> >>>>>> it needs a good sink that provides utility to operators looking to >>>>>> diagnose problems. It took too long for HTrace to provide any tooling >>>>>> that could help with even simple performance profiling. Maybe hooking >>>>>> it into Zipkin would get around that. Personally, I never managed to >>>>>> get the two to actually work together. >>>>>> >>>>>> My listing Zipkin as an option merely reflects my prioritization of >>>>>> practical impact of whatever we go to. I don't want to adopt some >>>>>> blue-sky effort. FWIW, OpenTracing docs at least claim to also provide >>>>>> a zipkin-sink compatible runtime. >>>>>> >>>>>> There's a whole community that just does distributed monitoring, maybe >>>>>> someone has time to survey some spaces and see if OpenTracing has any >>>>>> legs. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>