[I] [Feature] Track token usage metrics for embedding models [flink-agents]

via GitHub Wed, 24 Jun 2026 23:26:31 -0700


joeyutong opened a new issue, #858:
URL: https://github.com/apache/flink-agents/issues/858


   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/flink-agents/issues) and found nothing 
similar.
   
   ### Description
   
   Embedding model calls currently do not consistently report token usage 
metrics.
   
   Chat models already have a token accounting path: provider usage is attached 
to the chat response and later recorded as model-level token metrics. Embedding 
models return only vectors, so provider usage returned by OpenAI-compatible or 
DashScope-style embedding APIs can be dropped before it reaches the metrics 
layer.
   
   Affected paths include:
   
   - Direct Java or Python embedding model calls.
   - Vector store and RAG paths that auto-generate embeddings during `add`, 
`update`, or `query`.
   - Cross-language resource paths where the wrapper may receive an action 
metric group but the provider-side embedding resource performs the actual 
request.
   
   This makes it harder to validate and compare embedding model cost/usage, 
especially when a job mixes chat, embedding, and vector store operations. 
Embedding metrics do not need `completionTokens`, but should expose input-side 
token usage, for example `promptTokens` and `totalTokens`, under the same 
model/provider metric dimensions used by chat metrics where possible.
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature] Track token usage metrics for embedding models [flink-agents]

Reply via email to