lewismc opened a new pull request, #2367:
URL: https://github.com/apache/tika/pull/2367

   This covers task # 1 (Research and Setup) from 
[TIKA-4513](https://issues.apache.org/jira/browse/TIKA-4513) e.g.
   
   > 1. Research and Setup
   > 
   > Review OpenTelemetry Java getting-started guide and instrumentation 
registry for Tika-relevant libraries (e.g., auto-instrumentation for Jetty HTTP 
server, Apache HttpClient).
   > Set up a local dev environment with Tika Server, OpenTelemetry Java agent 
(latest stable release), and a test collector (e.g., [Grafana 
Alloy](https://grafana.com/docs/alloy/latest/) in Docker).
   > Prototype basic trace export for a sample /tika request.
   
   I have lots of commentary to add... which I will do in due course. For now I 
was thinking of creating a video demo to better communicate the PR and what it 
offers.
   
   One important thing, instrumentation (per OTEL) is disabled by default 
therefore the impact to existing Tika users is very small.
   
   Before I get around to asking people to review this PR, I want to agree on 
how structure the constituent tasks in TIKA-4513. I will continue that 
conversation on the Jira ticket.
   
   In the meantime if anyone wishes to take this for a spin the markdown 
documentation (most notably `OPENTELEMETRY.md`) will get you up and running.
   
   **NOTE**: I used `Claude-4.5-sonnet` to generate
   - the markdown documents, I will note that Claude generates lots of mistakes 
which I fixed by hand during my peer review. That being said, I've literally 
stepped through this documentation line-by-line now and I genuinely don't think 
I could have done it better myself if you gave me another week. I'm impressed 
and satisfied with the in-progress result.
   - some Javadoc, notably the Javadocs with loads of commentary. Again, I'm 
satisfied with the outcome and I think it will assist in a better understanding 
of the additions.
   - `TikaOpenTelemetryTest.java`... some basic unit test coverage which was 
convenient.
   - to figure out that `TikaOpenTelemetryConfig` had to `implements 
Initializable`... this saved me loads of study time as it had been ages since I 
looked at tika-server internals and lots has changed. 
   
   This instrumentation mega-project is likely similar in scale to tika-pipes. 
There is still loads of work to do. 
   
   You will also have noticed that I used 
[Jaeger](https://www.jaegertracing.io/) a basic example. I will be providing 
another example using [Grafana Alloy as the OTEL 
collector](https://github.com/grafana/alloy) as it is much more closely aligned 
with $dayjob but that being said I did want to demonstrate the power of OTEL as 
a vendor agnostic instrumentation framework. Very powerful indeed. 
   
   In the meantime heres a few screenshots which demonstrate what a trace 
containing two spans looks like in Jaeger. Pretty basic but exciting stuff.
   
   <img width="1710" height="1112" alt="Screenshot 2025-10-16 at 22 27 23" 
src="https://github.com/user-attachments/assets/d6a81991-6ccd-4d54-b743-a8cfc29a7286";
 />
   <img width="1710" height="1112" alt="Screenshot 2025-10-16 at 22 27 47" 
src="https://github.com/user-attachments/assets/a0e06925-9086-4d87-8dc8-1ab60a187aeb";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to