oscerd opened a new pull request, #23083:
URL: https://github.com/apache/camel/pull/23083

   ## Summary
   
   **Final AWS batch** of span decorators for `camel-telemetry`. Closes out AWS 
coverage for [CAMEL-23387](https://issues.apache.org/jira/browse/CAMEL-23387) 
by adding decorators for the **AI/ML** group: text-to-speech (Polly), image AI 
(Rekognition), OCR (Textract), speech-to-text (Transcribe), translation 
(Translate), NLP (Comprehend) and vector search (S3 Vectors).
   
   After this PR, all 36 AWS components in `components/camel-aws/` that have a 
Camel scheme will have a corresponding `SpanDecorator`. The only remaining 
follow-up on CAMEL-23387 is the Google Cloud decorators mentioned in the 
original ticket, which is in scope for a separate JIRA.
   
   ## Changes
   
   New `SpanDecorator` implementations under 
`org.apache.camel.telemetry.decorators`:
   
   - **`AwsPollySpanDecorator`** (`aws2-polly`) — Text-to-speech. Tags: 
`operation`, `voiceId`, `outputFormat`, `engine`, `languageCode`. Lexicon 
content (PLS XML), the synthesized audio's S3 destination (bucket/key), the SNS 
topic ARN for notifications, and the `requestCharacters` response counter are 
not surfaced.
   - **`AwsRekognitionSpanDecorator`** (`aws2-rekognition`) — Image/video AI. 
Tags: `operation`, `collectionId`, `jobId`, `jobName`, `faceId`. Image data 
(binary), kms key id, large config objects (operations/output/human-loop 
config) and bulk facial-attribute / feature collections are not surfaced.
   - **`AwsTextractSpanDecorator`** (`aws2-textract`) — Document OCR. Tags: 
`operation`, `s3Bucket`, `s3Object`, `jobId`. The S3 object version, pagination 
tokens and feature-type collection are not surfaced.
   - **`AwsTranscribeSpanDecorator`** (`aws2-transcribe`) — Speech-to-text. 
Tags: `transcriptionJobName`, `languageCode`, `mediaFormat`, `mediaUri`. The 
`Transcribe2Constants` interface does not define an `OPERATION` header — 
operations are configured via the URI — so no `operation` tag is emitted (the 
span name from the URI already conveys the action). Vocabulary phrase lists, 
tag maps and the resource ARN are not surfaced.
   - **`AwsTranslateSpanDecorator`** (`aws2-translate`) — Translation. Tags: 
`operation`, `sourceLanguage`, `targetLanguage`. Custom-terminology name 
collections are not surfaced.
   - **`AwsComprehendSpanDecorator`** (`aws2-comprehend`) — NLP. Tags: 
`operation`, `languageCode`, `endpointArn` (custom-classifier endpoint ARN, an 
input identifier). Detection results (detected language, sentiment, scores) 
live on the OUT message and are not visible in `beforeTracingEvent`, so they 
are not surfaced.
   - **`AwsS3VectorsSpanDecorator`** (`aws2-s3-vectors`) — Vector search. Tags: 
`operation`, `vectorBucketName`, `vectorIndexName`, `vectorId`. Vector 
embedding data and query vectors (floats), metadata maps, similarity 
thresholds, distance metrics and response payloads (similarity scores, result 
counts, index status, bucket ARN) are not surfaced.
   
   All seven decorators extend `AbstractSpanDecorator` (these are producer-only 
or producer+polling-consumer components without messaging-style ordering 
semantics) and are registered alphabetically in 
`META-INF/services/org.apache.camel.telemetry.SpanDecorator`. Unit tests cover 
header-to-tag extraction for each decorator.
   
   Header constants are mirrored from each component's `*Constants` interface 
(with a Javadoc reference back to the source), matching the convention used by 
previous batches and `AzureServiceBusSpanDecorator`. This avoids creating hard 
dependencies from `camel-telemetry` to the AWS component modules.
   
   ### Tag selection rationale
   
   Same two rules applied across batches 3 through 6:
   
   1. **Never emit values that may contain secrets, large payloads or PII** — 
image bytes, audio bytes, vector embeddings, lexicon content, vocabulary 
phrases, encrypted vector metadata.
   2. **Prefer the request _target_ over the response payload** — `voiceId`, 
`s3Bucket/s3Object`, `transcriptionJobName`, `vectorIndexName`, `collectionId` 
etc. Response data (detected sentiment in Comprehend, similarity scores in S3 
Vectors, request character counts in Polly) is response-shaped and not visible 
in `beforeTracingEvent`.
   
   In addition to the two rules above, this batch follows the 
IAM-principal-minimization principle established in earlier review fixes (KMS 
`keyId`, CloudTrail `username`, IAM `userName`, EKS `roleArn`): no `userId` 
from Rekognition collections, no `kmsKeyId` from Rekognition, no `resourceArn` 
from Transcribe.
   
   ## Test plan
   
   - [x] `mvn test` in `components/camel-telemetry` passes (133 tests, 
including 44 AWS decorator tests covering 36 components total — all AWS 
coverage on CAMEL-23387)
   - [x] Module-specific build (`mvn -DskipTests install`) succeeds
   - [x] No code style or formatter changes required
   
   ## Coverage on CAMEL-23387 (AWS — complete after this PR)
   
   | Batch | PR | Components |
   |---|---|---|
   | 1 | #23038 (merged) | SQS, SNS, Kinesis, S3 |
   | 2 | #23040 (merged) | DDB, DDB Streams, Lambda, EventBridge, SES, MQ, 
Kinesis Firehose, Bedrock |
   | 3 | #23045 (merged) | Athena, CloudWatch, KMS, MSK, Step Functions, 
Timestream, Redshift Data, CloudTrail |
   | 4 | #23077 (merged) | STS, IAM, Secrets Manager, Parameter Store, Security 
Hub, Config |
   | 5 | #23081 (merged) | EC2, ECS, EKS |
   | 6 | this PR | Polly, Rekognition, Textract, Transcribe, Translate, 
Comprehend, S3 Vectors |
   
   ## Follow-ups still pending
   
   - **Google Cloud decorators** mentioned in CAMEL-23387's description — 
separate scope, separate JIRA. Not in scope for this PR.
   - **Note on `aws-xray`**: the original "Compute & Tracing" follow-up listed 
in earlier PRs mentioned `camel-aws-xray`, but that module was **deprecated and 
removed** in commit ba9f8c5340a — it was a tracer integration with its own 
`SegmentDecorator` system, not a producer-style component, and there is nothing 
to add a `SpanDecorator` for.
   
   After this PR merges the AWS portion of CAMEL-23387 is fully complete and 
the JIRA can be closed pending the Google Cloud follow-up decision.
   
   ---
   
   _Claude Code on behalf of Andrea Cosentino_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to