Re: [DISCUSS] tracing framework updates

Tony Kurc Wed, 28 Feb 2018 07:12:26 -0800

Josh,
In theory, I'd be most happy if it were possible to have both sides of the
accumulo sandwich (one side being "the application", the other being
HDFS/zookeeper/etc.) instrumented and tracing in the same way to get a
comprehensive view. I just don't know how realistic that is.




On Wed, Feb 28, 2018 at 10:02 AM, Josh Elser <[email protected]> wrote:

> Thanks for letting us know, Tony. I can totally understand how the
> server-side tracing (and collapsing it can do) would be super-helpful in
> figuring out what's happening.
>
> I read that as one reason for simply not trying to get HDFS and Accumulo
> re-sync'ed. I think we have value in leaving what we presently have in
> Accumulo now over removing it completely.
>
>
> On 2/27/18 8:50 PM, Tony Kurc wrote:
>
>> Josh,
>> It was exclusively the first - using the traces in the server-side code.
>> The most common case is "I have a scan which is much slower than
>> expected",
>> and couldn't figure out why. I'm trying to think of alternative approaches
>> to using the traces, and honestly, doing a bunch of log aggregation is the
>> alternative I'd have to fall back to, and in some cases recompiling parts
>> of accumulo with new log messages in place.
>>
>>
>> Tony
>>
>> On Tue, Feb 27, 2018 at 7:18 PM, Josh Elser <[email protected]> wrote:
>>
>> Oh, that's a pleasant surprise to hear, actually.
>>>
>>> Anything you can share with the class, Tony? Would love to hear (even if
>>> brief) how it was used and benefited you.
>>>
>>> Specifically, I'm curious if...
>>>
>>> * You looked at traces from our server-side instrumented code
>>> * You instrumented your own code outside of Accumulo and used Accumulo as
>>> the backing store
>>> * You instrumented code inside/outside Accumulo and benefited from the
>>> server-side instrumentation (e.g. your code's spans collapsing with the
>>> server's spans)
>>>
>>>
>>> On 2/27/18 6:52 PM, Tony Kurc wrote:
>>>
>>> I'd personally be disappointed to see it removed. There is a bit of a
>>>> learning curve and startup cost to use it now, but when diagnosing major
>>>> challenges, it has been an invaluable capability.
>>>>
>>>> On Feb 27, 2018 3:15 PM, "Josh Elser" <[email protected]> wrote:
>>>>
>>>> Wow... that's, erm, quite the paper. Nothing like taking some pot-shots
>>>> at
>>>> another software project and quoting folks out of context.
>>>>
>>>> Does it help to break down the problem some more?
>>>>
>>>> * Is Accumulo getting benefit from tracing its library?
>>>> * Is Accumulo getting benefit from tracing context including HDFS calls?
>>>>
>>>> I feel like it is a nice tool to have in your toolbelt (having used it
>>>> successfully in the past), but I wonder if it's the most effective thing
>>>> to
>>>> keep inside of Accumulo. Specifically, would it be better to just pull
>>>> this
>>>> out of Accumulo outright?
>>>>
>>>> I don't think I have an opinion yet.
>>>>
>>>>
>>>> On 2/27/18 1:08 PM, Ed Coleman wrote:
>>>>
>>>> For general discussion - Facebook recently (Oct 28, 2017) published a
>>>>
>>>>> paper on tracing: Canopy: An End-to-End Performance Tracing and
>>>>> Analysis
>>>>> System (https://research.fb.com/publications/canopy-end-to-end-
>>>>> performance-tracing-at-scale/)
>>>>>
>>>>> As a bonus, they referenced Accumulo and HTrace in section 2.2
>>>>>
>>>>> "Mismatched models affected compatibility between mixed system
>>>>> versions;
>>>>> e.g. Accumulo and Hadoop were impacted by the “continued lack of
>>>>> concern
>>>>> in
>>>>> the HTrace project around tracing during upgrades”
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Tony Kurc [mailto:[email protected]]
>>>>> Sent: Tuesday, February 27, 2018 12:57 PM
>>>>> To: [email protected]
>>>>> Subject: Re: [DISCUSS] tracing framework updates
>>>>>
>>>>> I have some experience with opentracing, and it definitely seems
>>>>> promising, however, potentially promising in the same way htrace was...
>>>>> That being said, I did a cursory thought exercise of what it would take
>>>>> to
>>>>> do a swap of the current tracing in accumulo to opentracing, and I
>>>>> didn't
>>>>> come across any hard problems, meaning it could be a fairly
>>>>> straightforward
>>>>> refactor. I was hoping to explore the community a bit more at some
>>>>> upcoming
>>>>> conferences
>>>>>
>>>>> On Feb 27, 2018 11:59 AM, "Sean Busbey" <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2018/02/27 16:39:02, Christopher <[email protected]> wrote:
>>>>>>
>>>>>> I didn't realize HTrace was struggling in incubation. Maybe some of
>>>>>>
>>>>>>> us
>>>>>>>
>>>>>>> can
>>>>>>>
>>>>>>
>>>>>> start participating? The project did start within Accumulo, after all.
>>>>>>
>>>>>>>
>>>>>>> What
>>>>>>>
>>>>>>
>>>>>> does it need? I also wouldn't want to go back to maintaining
>>>>>> cloudtrace.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I suspect it's too late for HTrace. The last commit to the main
>>>>>>>
>>>>>> development branch was May 2017. They had a decent run of activity in
>>>>>> 2015 and an almost-resurgence in 2016, but they never really got
>>>>>> enough community traction to survive the normal ebb and flow of
>>>>>> contributor involvement.
>>>>>>
>>>>>> They need the things any project needs to be sustainable: regular
>>>>>> release cadences, a responsive contribution process, and folks to do
>>>>>> the long slog of building interest via e.g. production adoption.
>>>>>>
>>>>>> I'm unfamiliar with OpenTracing, but it was my understanding that
>>>>>>
>>>>>> Zipkin was more of a tracing sink, than an instrumentation API.
>>>>>>> HTrace is
>>>>>>>
>>>>>>> actually
>>>>>>>
>>>>>>
>>>>>> listed as an instrumentation library for Zipkin (among others).
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think the key is that for a instrumentation library to get adoption
>>>>>>>
>>>>>> it needs a good sink that provides utility to operators looking to
>>>>>> diagnose problems. It took too long for HTrace to provide any tooling
>>>>>> that could help with even simple performance profiling. Maybe hooking
>>>>>> it into Zipkin would get around that. Personally, I never managed to
>>>>>> get the two to actually work together.
>>>>>>
>>>>>> My listing Zipkin as an option merely reflects my prioritization of
>>>>>> practical impact of whatever we go to. I don't want to adopt some
>>>>>> blue-sky effort. FWIW, OpenTracing docs at least claim to also provide
>>>>>> a zipkin-sink compatible runtime.
>>>>>>
>>>>>> There's a whole community that just does distributed monitoring, maybe
>>>>>> someone has time to survey some spaces and see if OpenTracing has any
>>>>>> legs.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: [DISCUSS] tracing framework updates

Reply via email to