[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-13 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967360#comment-15967360
 ] 

Masatake Iwasaki commented on HDFS-11622:
-

Thanks for the comment, [~apurtell]. Considering batching in the HBase WAL, 
HDFS needs feature to add parent span when {{DFSOutputStream#write}} is called 
to append each WALEntry (and when {{hflush}} is called too?). Even code in 
trunk does not have this yet. Parent spans added only on queuing DFSPacket now.

The issue described here is about tracking tracing spans by not parent-child 
relationship but trace id . This is relevant to branch-2.7 only. While this is 
not crucial for end-to-end tracing of HBase WAL batching, I welcome the patch 
to fix this issue if there are needs for analyzing tracing spans by trace id 
only.

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961241#comment-15961241
 ] 

Andrew Purtell commented on HDFS-11622:
---

bq. I appreciate it if there are specific problems in downstream projects like 
Phoenix, otherwise we should be conservative to fix core DFS code in 
maintenance branch.

[~masatana] The spans in HDFS can have multiple parents because of batching in 
the HBase WAL, which Phoenix uses as platform. An interesting case. I think if 
this improvement is not committed to the shipping versions of Hadoop then 
Phoenix tracing, based on HTrace, won't work correctly and cannot be reliably 
used. HTrace is promising, but it needs to work correctly holistically with the 
whole stack or is rendered moot. 

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-07 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960420#comment-15960420
 ] 

Masatake Iwasaki commented on HDFS-11622:
-

bq. If you want, I can submit a patch for this one.

I appreciate it if there are specific problems in downstream projects like 
Phoenix, otherwise we should be conservative to fix core DFS code in 
maintenance branch.


> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-06 Thread Karan Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960131#comment-15960131
 ] 

Karan Mehta commented on HDFS-11622:


bq. As you pointed out, the current dataStreamer span in branch-2.7 seems not 
to have situation setting multiple parents. It looks like extention for future 
use?
Looks like an extension, since branch-2.7 depends on HTrace-3.1 which doesn't 
have that API. I had a quick look at the code, while creating a new 
{{MilliSpan}}, the constructor can take only 1 parent as the input. Only if 
explicit spans are built using {{Builder}} class inside the {{MilliSpan}} class 
they can be assigned an array of parents. At this point, the usages of this 
builder class also have provided only single parentId as input. According to my 
knowledge, for this particular branch we can go ahead by saving the traceId in 
the {{DFSPacket}} if its seems acceptable. Let me know your thoughts.

If you want, I can submit a patch for this one.

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-06 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960078#comment-15960078
 ] 

Masatake Iwasaki commented on HDFS-11622:
-

bq. The DFSPacket initializes the parents field when it is dumping the data in 
dataQueue with the line packet.addTraceParent(Tracer.getCurrentSpanId()), thus 
getting current trace from the ThreadLocal. At this point, I feel that we can 
also get the value of trace ID and add the info inside the DFSPacket. Any 
thoughts on this one?

As you pointed out, the current {{dataStreamer}} span in branch-2.7 seems not 
to have situation setting multiple parents. It looks like extention for future 
use?


> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-06 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960074#comment-15960074
 ] 

Masatake Iwasaki commented on HDFS-11622:
-

bq. I am unclear about the use of trace ID at this point if all of them can be 
easily traced via their parents span ID. Even in cases where the trace doesn't 
form any DAG and is a linearly growing span, the information can still be 
tracked via parent span ID.

That's right. In HTrace 4, there is no trace while part of span id is the 
equivalent.
https://github.com/apache/incubator-htrace/blob/4.2/htrace-core4/src/main/java/org/apache/htrace/core/SpanId.java#L30-L31

I think trace id is useful to effectively get relevant spans from whole spans 
space, before traversing parent-child relationship. If you use HBase or similar 
to store spans, you can co-locate relevant spans by using trace id as leading 
bits of rowkey.

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-06 Thread Karan Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959433#comment-15959433
 ] 

Karan Mehta commented on HDFS-11622:


I understood the following use case for such a requirement [non-RPC spans and 
mapping to multiple parents| 
https://github.com/opentracing/specification/issues/5].
{quote}
Another example is in HBase. HBase has a write-ahead log, where it does "group 
commit." In other words, if HBase gets requests A, B, and C, it does a single 
write-ahead log write for all of them. The WAL writes can be time-consuming 
since they involve writing to an HDFS stream, which could be slow for any 
number of reasons (network, error handling, GC, etc.).
{quote} 

Since requests A, B and C can be started independently, they will be assigned 
different trace ID as well as span ID. The WAL write will be single for them, 
having a single span for each of them, containing multiple parents pointing to 
each of them. I am unclear about the use of trace ID at this point if all of 
them can be easily traced via their parents span ID. Even in cases where the 
trace doesn't form any DAG and is a linearly growing span, the information can 
still be tracked via parent span ID. 

Although we have multiple parents, they way it should work is that all of them 
relate to the same span ID. Commented code for future use in the Description 
suggests that all the parents will be available at the time of start of 
{{dataStreamer}} span. The {{DFSPacket}} initializes the parents field when it 
is dumping the data in {{dataQueue}} with the line 
{{packet.addTraceParent(Tracer.getCurrentSpanId())}}, thus getting current 
trace from the {{ThreadLocal}}. At this point, I feel that we can also get the 
value of trace ID and add the info inside the {{DFSPacket}}. Any thoughts on 
this one?

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-05 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958293#comment-15958293
 ] 

Masatake Iwasaki commented on HDFS-11622:
-

Thanks for reporting this, [~karanmehta93].

We can not determine which trace id to set if there is a span having multiple 
parent. While it is limitation of HTrace after multiple parent feature was 
added, I think it is reasonable to set trace id of the parent if 
{{parents.length == 1}} for applications currently using trace id to analyze 
tracing spans.


> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

2017-04-05 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958062#comment-15958062
 ] 

James Taylor commented on HDFS-11622:
-

FYI, [~apurtell] - not sure if you've seen this one, but it's another important 
one.

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> --
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>   one = dataQueue.getFirst(); // regular data packet
>   long parents[] = one.getTraceParents();
>   if (parents.length > 0) {
>  scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
> // TODO: use setParents API once it's available from HTrace 
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
>   }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org