Hello,I have some comment/concerns regarding the HTrace API, and was wondering 
whether extensions/changes would be considered. I'm listing the most important 
here, if there is interest we can discuss more in detail.

1) From the HTrace Developer Guide: 



TraceScope objects manage the lifespan of Span objects. When a TraceScope is 
created, it often comes with an associated Span object. When this scope is 
closed, the Span will be closed as well. “Closing” the scope means that the 
span is sent to a SpanReceiver for processing.


One of the implications of this model is the fact that nested spans (for 
example instrumenting nested function calls) will be delivered to the receiver 
in reverse order (as the innermost function completes before the outermost. 
This may introduce more complexity on the logic in the span receiver. 

Also, the fact that information about a span is not delivered until the span is 
closed, relies on the program not terminating abruptly. In Java this is not so 
much of a problem, but in C what happens if a series of nested function calls 
is instrumented with spans, and the innermost function crashes? As far as I can 
tell none of the span is delivered. This makes the use of the tracing API 
unreliable for bug analysis.

Would you consider a change where each API call produces at least one event 
sent to the SpanReceiver? 

2) HTrace has a concept of spans having one or more parents.  This allows, for 
example, to capture the fact that a process makes an RPC call to another.  
However, there is no information about when within the span the caller calls 
the callee. A caller span may have two child spans, representing the fact that 
it made two RPC calls, but the order in which those were made is lost in the 
model (using the timestamps associated to the begin of the callee spans is not 
feasible, as there may be different RPC latencies, or simply the clocks may not 
be aligned. Also, the only relation captured by the API is between blocks. 

I propose a more general API with a concept of spans and  points (timestamped 
sets of annotations), and cause-effect relationship among points. an RPC call 
can be represented as a point in the caller span marked as cause, and a  
(begin) point in the callee span marked as effect. This is very flexible and 
allow to capture all sorts of relationship, not just parent child. for example, 
a DMA operation may be initiated in a block  and captured as a point, the 
completion captured as a point in a distinct block in the same entity (an 
abstraction for a unit of concurrency) 
3) there doesn't seem to be any provision in the HTrace API for considering 
clock domains. In a distributed system, there may be processes running on the 
same host, processes running in the same cluster, process running in different 
clusters. Different domain may have different degrees of clock mis-alignment. 
Providing indications of this information in the API allows the backend or UI 
trace building to make more accurate inferences on how concurrent entities line 
up.
4) does the API provide a mechanism for creating "delegated traces"? what I 
mean by this is that in some circumstances  some thread may need to create 
traces on behalf of some other element which may not have such capabilty. For 
example, a mobile device may have some custom tracing mechanism, and attach the 
information to a request for the server. The server would then need to create 
the HTrace trace from the existing data passed in the request (including 
timestamps)
Let me know if there is interest in discussing changes at this level.
Thanks,
                    Roberto

Reply via email to