Lewis John McGibbney created NUTCH-3146:
-------------------------------------------
Summary: Add Resource Utilization Metrics for Fetcher
Key: NUTCH-3146
URL: https://issues.apache.org/jira/browse/NUTCH-3146
Project: Nutch
Issue Type: Sub-task
Components: metrics
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Fix For: 1.22
This task concerns adding resource utilization metrics to the Fetcher to
improve observability of thread utilization, queue depths, and wait times
during crawl jobs.
h3. Proposed Metrics
Thread Utilization:
* threads_active_max - Peak concurrent active threads
* threads_spin_waiting_max - Peak threads waiting for work
* threads_spin_waiting_sum_ms - Total spin-wait time
Queue Depth:
* queues_total_size_max - Peak URLs queued
* queues_count_max - Peak number of host/domain queues
* queues_blocked_max - Peak queues blocked due to exceptions
* queues_in_progress_max - Peak in-flight fetches
Wait Time:
* queue_wait_time - Latency percentiles (p50/p95/p99) for queue acquisition
These metrics enable Nutch admins to:
* Identify thread contention and queue bottlenecks
* Tune fetcher.threads.fetch and fetcher.threads.per.queue settings
* Detect capacity issues across crawl jobs
* Compare performance between different configurations
Note, these metrics are designed for post-hoc analysis rather than real-time
monitoring.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)