richardstartin opened a new pull request #8457:
URL: https://github.com/apache/pinot/pull/8457
This is a straw man proposal for a tracing API which allows detailed capture
of operator statistics, far beyond execution time. The default tracing
implementation is delegated to by the tracing SPI which can be overridden at
startup.
`Tracer` has two operations:
1. Register the requestId if tracing is enabled, the tracing implementation
is responsible for propagating this to query threads. The default
implementation is privileged in its use of `TraceCallable` and `TraceRunnable`
but third part implementations will use class transformation to add a
`requestId` field to `FutureTask`. It is also responsible for maintaining
lineage between parent and child spans, stack maintenance etc..
2. Start an operator span.
Operator spans are closable, and are completed when closed.
`OperatorInvocationTrace` will be passed into block evaluation, and various
fields can be recorded into it:
```java
public interface OperatorInvocationTrace {
/**
* Sets the class of the operator. This allows various class-level
properties
* to be interrogated and cached in a {@see ClassValue}.
* @param operator the class of the operator
*/
void setOperatorClass(Class<?> operator);
/**
* Sets the number of docs scanned by the operator.
* @param docsScanned how many docs were scanned.
*/
void setDocsScanned(long docsScanned);
/**
* Sets the number of bytes scanned by the operator if this is possible to
compute.
* @param bytesScanned the number of bytes scanned
*/
void setBytesProcessed(long bytesScanned);
/**
* If the operator is a filter, determines the filter type (scan or index)
and the predicate type
* @param filterType SCAN or INDEX
* @param predicateType e.g. BETWEEN, REGEXP_LIKE
*/
void setFilterType(FilterType filterType, String predicateType);
/**
* The phase of the query
* @param phase the phase
*/
void setPhase(Phase phase);
/**
* Records whether type transformation took place during the operator's
invocation and what the types were
* @param inputDataType the input data type
* @param outputDataType the output data type
*/
void setDataTypes(FieldSpec.DataType inputDataType, FieldSpec.DataType
outputDataType);
/**
* Records the range of docIds during the operator invocation. This is
useful for implicating a range of records
* in a slow operator invocation.
* @param firstDocId the first docId in the block
* @param lastDocId the last docId in the block
*/
void setDocIdRange(int firstDocId, int lastDocId);
/**
* If known, record the cardinality of the column within the segment this
operator invoked on
* @param cardinality the number of distinct values
*/
void setColumnCardinality(int cardinality);
}
```
The default implementation records none of these. Operator implementations
will need to be modified to record values into the `OperatorInvocationTrace`.
Dead code elimination is relied upon to eliminate overhead when these values
are written to default implementations of `OperatorInvocationTrace`. These
implementations will not be modified before this SPI is agreed to.
The `Tracer` does not need to attach trace information to the output
request, and where the trace information goes is implementation defined; it may
output it to a file, an in-memory circular buffer which can be dumped on demand.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]