Re: [DISCUSS] Attic podling Apache HTrace?

Stack Fri, 18 Aug 2017 10:44:42 -0700

Thanks Adrian for the editorial on the landscape. Helps, especially coming
from yourself.


Given current state of the project, a retrofit to come up on OT is not the
solution to the topic-at-hand (and besides I have a colored opinion on
taking on the API of another after spending a bunch of time recently
undoing our mistake letting third-party Interfaces and Classes show through
in hbase).

I appreciate the higher-level point made by Andrew, that it is hard to
thread a cross-cutting library across the Hadoop landscape whether because
releases happen on the geologic time scale or that there is little by way
of coordination.

Can we do a focused 'win' like Colin suggests? E.g. hook up hbase and hdfs
end-to-end with connection to a viewer (zipkin? Or text dumps in a
webpage?). A while back I had a go at the hbase side but it was burning up
the hours just getting it hooked up w/ tests to scream if any spans were
broken in a refactor. I had to put it aside.

Like the rest of you, my time is a little occupied elsewhere these times so
I can't revive the project, not at the moment at least.

S








On Thu, Aug 17, 2017 at 5:56 PM, Adrian Cole <adrian.f.c...@gmail.com>
wrote:

> Just speaking on the OpenTracing vs whatever part. What colin mentioned is
> correct. It is a library api defined for tracing and not an implementation
> of a tracer or a backend.
>
> That said, there are certain backends that are preferred, notably lightstep
> and jaeger (by uber). This is because folks here did most of the defining
> even if others do participate. This affects a view of what tracing is
> inside OT. Notably, both have a view that logging is tracing (ex it is ok
> and sometimes encouraged to push system logs into a span). These opinions
> are sometimes encouraged through presentations etc which might make it a
> better or worse fit as an Htrace replacement. For example, most in zipkin
> are not keen on escalating it to a logging system as it was not designed
> for this, and similarly to here, we couldnt afford to accept more
> responsibility like that.
>
> HTrace is almost never mentioned in OpenTracing discussions except when I
> do. That by itself has been troubling to me as if it were meant to be
> neutral it should have been mentioned constantly and impacting design.
> Anyway..
>
> The "actual dapper" team which is called census have spun up and are moving
> fast. This has no backend yet but most can or soon will report to zipkin.
> https://github.com/census-instrumentation
>
> Most important to all of this imho is that the jury is out on whether
> instrumentation libraries are indeed shared. For example, eventhough
> amazon, microsoft, dynatrace app dynamics, new relic, facebook etc all know
> about OpenTracing, it isnt what they are using as a core api. In some cases
> it is because they have an event layer instead, and in others it is that
> they prefer a data type approach as opposed to a dictated library
> interface. Some in OpenTracing have struggled to influence the project
> around points they have felt important, notably propagation, and wrote
> their own bespoke layers or wrappers to handle it properly. Some of this is
> fixable in OT, but imho the change dynamics, culture and leadership have
> not changed since inception.
>
> Many zipkin users use OpenTracing libraries, probably due to the high level
> of staff and marketing they have behind the effort. For example, Red Hat
> staff write a lot of things faster than volunteers can. That said, many
> zipkin users prefer existing, especially well attended, libraries by the
> project or ecosystem. Looking at github, native adoption is far more than
> OT. In many cases, users roll their own still. This is not the same as lack
> of a complete choice.. developers can, do and continue to write their own
> code if given a spec on how to do it. This is also true in OpenTracing,
> except there you need to both know the abstraction and the backend to write
> custom.
>
> OpenTracing is in CNCF now, as is their preferred system Jaeger. As far as
> I know you wouldnt also be in ASF, but I dont know if that matters. Census
> is likely to be CNCF because google (but I have no insight, just a guess).
> Zipkin is on hold wrt foundation, we didnt have enough ummph to get to one
> last year, so jury is still out.
>
> Personally, I think Census have a lot of things right, ex separation of
> concerns between logging metrics tracing and propagation. That said I think
> all could learn from htrace or collaborate regardless of this outcome.
>
> On 18 Aug 2017 05:57, "Colin McCabe" <cmcc...@apache.org> wrote:
>
> On Thu, Aug 17, 2017, at 14:40, Andrew Purtell wrote:
> > > That's not the issue.  We already have HTrace integration with Hadoop
> > RPC, such that a Hadoop RPC creates a span.
> >
> > This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
> > uses it. Likewise, HBase RP These are not general purpose RPC stacks by
> > any stretch. There are some of those around. Some have tracing built in.
> > They take some of the oxygen out of the room. I think that is a fair
> > point when thinking about the viability of a podling that sees little
> activity
> > as it is.
>
> Yeah-- maybe we should integrate HTrace into HBase RPC as well.
>
> I don't think RPC-specific trace systems have been a strong competitors.
>  Since the RPC landscape is so fragmented, those systems tend to not get
> used by many people.  Our strongest open source competitors, OpenTracing
> and OpenZipkin, support multiple RPC systems.  (Zipkin originally was
> specific to Finagle, but that is no longer true.)
>
> > I didn't come here to suggest HTrace go away, though. I came to raise a
> > few points on why adoption and use of HTrace has very likely suffered
> from
> > usability problems. These problems are still not completely resolved.
> > Stack describes HTrace integration with HBase as broken. My experience
> has been
> > I have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
> > anything that works at all. I also sought to tie some of those problems
> > to ecosystem issues because I know it is hard. For what it's worth,
> thanks.
>
> I think you make some very good points about the difficulty of doing
> cross-project coordination.  One thing that really held back HTrace 4.0
> was that it was originally scheduled to be part of Hadoop 2.8-- and the
> Hadoop 2.8 release was delayed for a really, really long time, to the
> point when it almost became a punchline.  So people had to use vendor
> releases to get HTrace 4, because those were the only releases with new
> Hadoop code.
>
> Colin
>
>
> >
> >
> >
> > On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > > > What about OpenTracing (http://opentracing.io/)? Is this the
> successor
> > > > project to ZipKin? In particular grpc-opentracing (
> > > > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > > > fulfill in open source the tracing architecture described in the
> Dapper
> > > > paper.
> > >
> > > OpenTracing is essentially an API which sits on top of another tracing
> > > system.
> > >
> > > So you can instrument your code with the OpenTracing library, and then
> > > have that send the trace spans to OpenZipkin.
> > >
> > > Here are some thoughts here about this topic from a Zipkin developer:
> > > https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> > > c7#what-is-the-relationship-between-zipkin-and-opentracing
> > > .  Probably Adrian Cole can chime in here as well.
> > >
> > > In general the OpenTracing folks have been friendly and respectful.
> (If
> > > any of them are reading this, I apologize for not following some of the
> > > discussions on gitter more thoroughly-- my time is just split so many
> > > ways right now!)
> > >
> > > >
> > > > If one takes a step back and looks at all of the hand rolled RPC
> stacks
> > > > in
> > > > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > > > everyone
> > > > migrated to a single RPC stack - gRPC - would provide the unified
> tracing
> > > > layer envisioned by HTrace. The tracing integration is then done
> exactly
> > > > in
> > > > one place. In contrast HTrace requires all of the components to
> sprinkle
> > > > spans throughout the application code.
> > > >
> > >
> > > That's not the issue.  We already have HTrace integration with Hadoop
> > > RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> > > system is actually very straightforward-- you just add two fields to
> the
> > > base RPC request definition, and patch the RPC system to use them.
> > >
> > > Just instrumenting RPC is not sufficient.  You need programmers to add
> > > explicit span annotations to your code so that you can have useful
> > > information beyond what a program like wireshark would find.  Things
> > > like what disk is a request hitting, what HBase PUT is an HDFS write
> > > associated with, and so forth.
> > >
> > > Also, this is getting off topic, but there is a new RPC system every
> > > year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> > > REST/JSON, etc.  They all have advantages and disadvantages.  For
> > > example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> > > and performance problems with the protobuf-java library.  I wish GPRC
> > > luck, but I think it's good for people to experiment with different
> > > libraries.  It doesn't make sense to try to force everyone to use one
> > > thing, even if we could.
> > >
> > > > The Hadoop ecosystem is always partially at odds with itself, if for
> no
> > > > other reason than there is no shared vision among the projects. There
> are
> > > > no coordinated releases. There isn't even agreement on which version
> of
> > > > shared dependencies to use (hence the recurring pain in various
> places
> > > > with
> > > > downstream version changes of protobuf, guava, jackson, etc. etc).
> > > > Therefore HTrace is severely constrained on what API changes can be
> made.
> > > > Unfortunately the different major versions of HTrace do not
> interoperate
> > > > at
> > > > all. And are not even source compatible. While is not unreasonable at
> all
> > > > for a project in incubation, when combined with the inability of the
> > > > Hadoop
> > > > ecosystem to coordinate releases as a cross-cutting dependency ships
> a
> > > > new
> > > > version, this has reduced the utility of HTrace to effectively nil
> for
> > > > the
> > > > average user. I am sorry to say that. Only a commercial Hadoop vendor
> or
> > > > power user can be expected to patch and build a stack that actually
> > > > works.
> > >
> > > One correction: The different major versions of HTrace are indeed
> source
> > > code compatible.  You can build an application that can use both HTrace
> > > 3 and HTrace 4.  This was absolutely essential for us because of the
> > > version skew issues you mention.
> > >
> > > > On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney <
> > > lewi...@apache.org> wrote:
> > > >
> > > > > Hi Mike,
> > > > > I think this is a fair question. We've probably all been associated
> > > with
> > > > > projects which just don't really make it. It would appear that
> HTrace
> > > is
> > > > > one of them. This is not to say that there is nothing going on with
> the
> > > > > tracing effort generally (as there is) but it looks like HTrace as
> a
> > > > > project may be headed to the Attic.
> > > > > I suppose the response to this thread will determine what
> happens...
> > >
> > > Thanks, Lewis.
> > >
> > > I think maybe we should try to identify the top tracing priorities for
> > > HBase and HDFS and see how HTrace / OpenTracing / OpenZipkin could fit
> > > into those.  Just start from a nice crisp set of requirements, like
> > > Stack suggested, and think about how we could make those a reality.  If
> > > we can advance the state of tracing in hadoop, that will be a good
> thing
> > > for our users, even if htrace goes to the attic.  I've been mostly
> > > working on Apache Kafka these days but I could drop by to brainstorm.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > > > Lewis
> > > > > 
> > > > >
> > > > >
> > > > > On Wed, Aug 16, 2017 at 10:01 AM, <
> > > > > dev-digest-h...@htrace.incubator.apache.org> wrote:
> > > > >
> > > > > >
> > > > > > From: Mike Drob <md...@apache.org>
> > > > > > To: dev@htrace.incubator.apache.org
> > > > > > Cc:
> > > > > > Bcc:
> > > > > > Date: Wed, 16 Aug 2017 12:00:49 -0500
> > > > > > Subject: [DISCUSS] Attic podling Apache HTrace?
> > > > > > Hi folks,
> > > > > >
> > > > > > Want to bring up a potentially uncofortable topic for some. Is it
> > > time to
> > > > > > retire/attic the project?
> > > > > >
> > > > > > We've seen a minimal amount of activity in the past year. The
> last
> > > > > release
> > > > > > had two bug fixes, and had been pending for several months before
> > > > > somebody
> > > > > > reminded me to push the artifacts to subversion from the staging
> > > > > directory.
> > > > > >
> > > > > > I'd love to see a renewed set of activity here, but I don't think
> > > there
> > > > > is
> > > > > > a ton of interest going on.
> > > > > >
> > > > > > HBase is still on version 3. So is Accumulo, I think. Hadoop is
> on
> > > 4.1,
> > > > > > which is a good sign, but I haven't heard much from them
> recently. I
> > > > > > definitely do no think we are at the point where a lack of
> releases
> > > and
> > > > > > activity is a sign of super advanced maturity and stability.
> > > > > >
> > > > > > Your thoughts?
> > > > > >
> > > > > > Mike
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > http://home.apache.org/~lewismc/
> > > > > @hectorMcSpector
> > > > > http://www.linkedin.com/in/lmcgibbney
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >    - A23, Crosstalk
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
>

Re: [DISCUSS] Attic podling Apache HTrace?

Reply via email to