This is an automated email from the ASF dual-hosted git repository. wu-sheng pushed a commit to branch feat/performance-config in repository https://gitbox.apache.org/repos/asf/skywalking-horizon-ui.git
commit e10ea87f1a9eb108cce9bb4babe1f665a2ffd64e Author: Wu Sheng <[email protected]> AuthorDate: Mon Jun 22 20:49:58 2026 +0800 feat(config): performance section in horizon.yaml — relocate fan-out/caps tuning Operational tuning that was hardcoded in routes or misplaced inside published dashboard templates now lives in one operator-owned, hot-reloaded `performance` section in horizon.yaml. Pure relocation — defaults equal the prior built-in values, enforced by a new schema-default-vs-example drift test. - performance.bulk: per-route bulk size + concurrency for the topology / 3D-map / landing / dashboard OAP fan-outs (was hardcoded 150/200/4, 6/8, 6). - performance.limits: the service-map render valve (5000/15000) and per-request record caps for traces / logs / browser logs (maxPageSize). - The 3D map's metric fan-out moved out of its OAP-published template (the `pipeline` block) into performance.bulk.infra3d; the BFF injects it into the config response so the UI is unchanged, and a stale template still carrying `pipeline` is accept-and-ignored. - Unified page-size pickers (20/30/50/100) across Traces, Logs, and Browser Logs (Browser Logs gains a picker; the trace cap drops 200 -> 100 to match). - Dockerfile sets a default NODE_OPTIONS=--max-old-space-size; docs cover Node-heap sizing against the in-memory source-map budget. - Fixed an example.yaml rbac drift (roles were missing infra-3d:read); the new drift test keeps schema defaults and horizon.example.yaml byte-identical. Validated: BFF+UI type-check, lint, 124+113 tests, both builds, license-eye 0 invalid; live demo-OAP smoke (topology + traces) unchanged. --- CHANGELOG.md | 7 +- Dockerfile | 4 +- .../bff/src/bundled_templates/infra-3d/config.json | 6 -- apps/bff/src/config/schema.test.ts | 70 ++++++++++++++++++++ apps/bff/src/config/schema.ts | 76 ++++++++++++++++++++++ apps/bff/src/http/config/infra-3d.ts | 16 ++++- apps/bff/src/http/query/browser-errors.ts | 10 +-- apps/bff/src/http/query/dashboard.ts | 2 +- apps/bff/src/http/query/deployment.ts | 5 +- apps/bff/src/http/query/endpoint-dependency.ts | 5 +- apps/bff/src/http/query/instance-topology.ts | 5 +- apps/bff/src/http/query/landing.ts | 12 ++-- apps/bff/src/http/query/log.ts | 12 ++-- apps/bff/src/http/query/topology.ts | 15 ++--- apps/bff/src/http/query/trace.ts | 26 ++++---- apps/bff/src/logic/infra-3d/types.ts | 14 ---- apps/bff/src/logic/infra-3d/validate.ts | 19 ++---- .../browser-errors/LayerBrowserErrorsView.vue | 13 +++- apps/ui/src/layer/logs/LayerLogsView.vue | 1 + apps/ui/src/layer/traces/LayerTracesView.vue | 3 +- apps/ui/src/layer/traces/LayerZipkinTracesView.vue | 3 +- docs/operate/infra-3d-map.md | 8 +-- docs/setup/container-image.md | 15 +++++ docs/setup/horizon-yaml.md | 60 +++++++++++++++++ horizon.example.yaml | 40 +++++++++++- 25 files changed, 358 insertions(+), 89 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9dfdb63..b381898 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,7 +6,12 @@ The version line is shared by every package in the monorepo (apps + shared packa ## 1.0.0 -(In development — fill in highlights here before cutting the release.) +### Performance & behavior tuning + +- **New `performance` section in `horizon.yaml`.** Tune how hard the BFF fans metric queries out to OAP — per-route bulk (request) sizes and concurrency for the topology, 3D-map, landing, and dashboard fan-outs — plus protective caps: the service-map render valve (`topologyMaxNodes` / `topologyMaxEdges`) and per-request record caps for traces / logs / browser logs. Operational, hot-reloaded, per-deployment; defaults match the previous built-in values, so the whole block is optional. Rais [...] +- **3D-map fan-out tuning moved out of the dashboard template into `horizon.yaml`** (`performance.bulk.infra3d`). These metric concurrency / batch knobs were operational settings misplaced in a published-to-OAP dashboard template (not even surfaced in the admin editor); a stale template still carrying the old `pipeline` block is now accepted and ignored. +- **Unified page-size pickers across the event lists.** Traces, Logs, and Browser Logs share a `20 / 30 / 50 / 100` page-size dropdown — and Browser Logs gains a picker it never had (it had a fixed 100). Each picker's max matches the server-side fetch cap in `performance.limits.maxPageSize`. +- **Node memory sizing guidance.** The container image now sets a default `NODE_OPTIONS=--max-old-space-size`, and the docs cover sizing the Node heap to your container memory limit and the in-memory source-map budget. ## 0.7.0 diff --git a/Dockerfile b/Dockerfile index d77e34b..fe998dc 100644 --- a/Dockerfile +++ b/Dockerfile @@ -79,7 +79,9 @@ ENV NODE_ENV=production \ HORIZON_SETUP_FILE=/data/horizon-setup.json \ HORIZON_ALARMS_FILE=/data/horizon-alarms.json \ HORIZON_WIRE_LOG_FILE=/data/horizon-wire.jsonl \ - HORIZON_SOURCEMAPS_DIR=/app/sourcemaps + HORIZON_SOURCEMAPS_DIR=/app/sourcemaps \ + # Match this to the container memory limit and your sourceMaps budget — the in-heap map cache lives inside it. + NODE_OPTIONS=--max-old-space-size=768 USER horizon EXPOSE 8081 diff --git a/apps/bff/src/bundled_templates/infra-3d/config.json b/apps/bff/src/bundled_templates/infra-3d/config.json index 4bb6e87..967a026 100644 --- a/apps/bff/src/bundled_templates/infra-3d/config.json +++ b/apps/bff/src/bundled_templates/infra-3d/config.json @@ -8,12 +8,6 @@ "crossLevelCall": { "color": "#f0a04b", "style": "solid", "arrow": true }, "intraCall": { "color": "rgba(255,255,255,0.4)", "style": "solid", "arrow": false } }, - "pipeline": { - "metricChunkSize": 6, - "metricConcurrency": 4, - "topologyConcurrency": 4, - "templateConcurrency": 8 - }, "unknownLayer": { "level": "middleware", "badge": "unclassified" diff --git a/apps/bff/src/config/schema.test.ts b/apps/bff/src/config/schema.test.ts new file mode 100644 index 0000000..f742523 --- /dev/null +++ b/apps/bff/src/config/schema.test.ts @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import { readFileSync } from 'node:fs'; +import { fileURLToPath } from 'node:url'; +import { dirname, resolve } from 'node:path'; +import { describe, expect, it } from 'vitest'; +import YAML from 'yaml'; +import { configSchema } from './schema.js'; +import { interpolateEnv } from './loader.js'; + +describe('configSchema defaults', () => { + it('parses an empty object — every non-optional field has a default', () => { + expect(() => configSchema.parse({})).not.toThrow(); + }); +}); + +// Guard against horizon.example.yaml drifting from the schema defaults. The +// example is "reference, not override" — every value it shows is meant to +// equal what the BFF runs with when the block is omitted. If a default +// changes (or someone edits the example to a non-default), this fails so the +// two are reconciled before merge. +describe('horizon.example.yaml matches schema defaults', () => { + const here = dirname(fileURLToPath(import.meta.url)); + const examplePath = resolve(here, '../../../../horizon.example.yaml'); + const example = YAML.parse(interpolateEnv(readFileSync(examplePath, 'utf8'))) ?? {}; + const defaults = configSchema.parse({}) as Record<string, unknown>; + + // YAML omits a value as null; the schema models the same absence as the + // empty string (interpolated `${VAR:}`). Treat the two as equal so an + // unset path doesn't read as drift. + const norm = (v: unknown): unknown => (v === null || v === undefined ? '' : v); + + // Walk only what the example actually declares; the example is allowed to + // omit fields (those fall back to defaults at runtime). Every scalar / + // array it DOES carry must match the parsed default at the same path. + const walk = (exVal: unknown, defVal: unknown, path: string): void => { + if (Array.isArray(exVal) || (exVal !== null && typeof exVal === 'object')) { + if (Array.isArray(exVal)) { + expect(defVal, `${path} should be an array in defaults`).toEqual(exVal); + return; + } + const exObj = exVal as Record<string, unknown>; + const defObj = (defVal ?? {}) as Record<string, unknown>; + for (const key of Object.keys(exObj)) { + walk(exObj[key], defObj[key], path ? `${path}.${key}` : key); + } + return; + } + expect(norm(exVal), `${path} drifted from schema default`).toEqual(norm(defVal)); + }; + + it('every value present in the example equals the schema default', () => { + walk(example, defaults, ''); + }); +}); diff --git a/apps/bff/src/config/schema.ts b/apps/bff/src/config/schema.ts index 51f6fd6..c220d3b 100644 --- a/apps/bff/src/config/schema.ts +++ b/apps/bff/src/config/schema.ts @@ -385,6 +385,81 @@ const layersSchema = z .strict() .default({ excluded: DEFAULT_EXCLUDED_LAYERS }); +// ──────────────────────────────────────────────────────────────────── +// Performance / behavior tuning — how hard the BFF fans queries out to +// OAP, plus the render / fetch caps that protect storage. OPERATIONAL, +// per-deployment, hot-reloaded — NOT dashboard content (those live in +// templates published to OAP). Defaults equal the built-in values, so +// omitting this block changes nothing. Every value is clamped to a hard +// ceiling (the `.max()` below) — config can lower, never exceed it. +const performanceSchema = z + .object({ + bulk: z + .object({ + // Service-map family routes (topology / instance-topology / + // deployment / endpoint-dependency). `*BulkSize` = aliased MQE + // fragments per OAP request; `concurrency` = parallel requests. + topology: z + .object({ + nodeBulkSize: z.number().int().min(1).max(500).default(150), + edgeBulkSize: z.number().int().min(1).max(500).default(200), + concurrency: z.number().int().min(1).max(16).default(4), + }) + .strict() + .default({}), + // 3D infrastructure-map metric fan-out (relocated from the 3D + // template's former `pipeline` block). + infra3d: z + .object({ + metricBulkSize: z.number().int().min(1).max(12).default(6), + metricConcurrency: z.number().int().min(1).max(8).default(4), + topologyConcurrency: z.number().int().min(1).max(16).default(4), + templateConcurrency: z.number().int().min(1).max(32).default(8), + }) + .strict() + .default({}), + // Per-layer landing: metric columns fetched in service batches. + landing: z + .object({ + bulkSize: z.number().int().min(1).max(12).default(6), + concurrency: z.number().int().min(1).max(16).default(8), + }) + .strict() + .default({}), + // Dashboard widget metric fan-out. + dashboard: z + .object({ + bulkSize: z.number().int().min(1).max(12).default(6), + }) + .strict() + .default({}), + }) + .strict() + .default({}), + limits: z + .object({ + // Service-map render valve: a graph larger than this is rejected + // with a "narrow the scope" notice rather than drawn unreadably. + topologyMaxNodes: z.number().int().positive().default(5000), + topologyMaxEdges: z.number().int().positive().default(15000), + // Max RECORDS per request (the OAP storage LIMIT) for each event + // list — NOT a page count. The UI page-size picker maxes at the + // same value, so a client can't out-ask the dropdown. + maxPageSize: z + .object({ + traces: z.number().int().min(1).max(500).default(100), + logs: z.number().int().min(1).max(500).default(100), + browserLogs: z.number().int().min(1).max(500).default(100), + }) + .strict() + .default({}), + }) + .strict() + .default({}), + }) + .strict() + .default({}); + export const configSchema = z .object({ server: serverSchema.default({}), @@ -399,6 +474,7 @@ export const configSchema = z debugLog: debugLogSchema, query: querySchema, sourceMaps: sourceMapsSchema, + performance: performanceSchema, // Deprecated + ignored. The 3D-map config moved to OAP (a template kind); // the old file-backed `infra3d.file` knob is gone. Accepted here (rather // than rejected by `.strict()`) so an existing config carrying the block diff --git a/apps/bff/src/http/config/infra-3d.ts b/apps/bff/src/http/config/infra-3d.ts index be03c3f..84ac1f4 100644 --- a/apps/bff/src/http/config/infra-3d.ts +++ b/apps/bff/src/http/config/infra-3d.ts @@ -60,7 +60,21 @@ export function registerInfra3dConfigRoutes( { preHandler: auth }, async (_req: FastifyRequest, reply: FastifyReply) => { const cfg = await resolveEffectiveConfig(deps); - return reply.send(cfg); + // The metric fan-out budget is OPERATIONAL (per-deployment, hot- + // reloaded), so it lives in horizon.yaml — NOT the published template. + // Inject it server-side so the UI keeps reading `cfg.pipeline.*`; this + // overrides any stale `pipeline` a hand-edited / imported template row + // might still carry (validate.ts accepts-and-ignores it). + const perf = deps.config.current.performance.bulk.infra3d; + return reply.send({ + ...cfg, + pipeline: { + metricChunkSize: perf.metricBulkSize, + metricConcurrency: perf.metricConcurrency, + topologyConcurrency: perf.topologyConcurrency, + templateConcurrency: perf.templateConcurrency, + }, + }); }, ); } diff --git a/apps/bff/src/http/query/browser-errors.ts b/apps/bff/src/http/query/browser-errors.ts index 5e78455..0a86690 100644 --- a/apps/bff/src/http/query/browser-errors.ts +++ b/apps/bff/src/http/query/browser-errors.ts @@ -49,10 +49,12 @@ export interface BrowserErrorsRouteDeps { } const DEFAULT_WINDOW_MIN = 30; -const MAX_PAGE_SIZE = 100; -function clampPageSize(requested: number | undefined, fallback: number): number { +/** OAP feeds `paging.pageSize` straight to storage as a LIMIT. The cap + * is `performance.limits.maxPageSize.browserLogs` (default 100); + * mirror that server-side so the cap holds against direct API callers. */ +function clampPageSize(requested: number | undefined, fallback: number, max: number): number { if (!Number.isFinite(requested as number) || (requested as number) < 1) return fallback; - return Math.min(MAX_PAGE_SIZE, Math.round(requested as number)); + return Math.min(max, Math.round(requested as number)); } function defaultWindow( @@ -194,7 +196,7 @@ export function registerBrowserErrorsRoute(app: FastifyInstance, deps: BrowserEr queryDuration: withColdStage(req, { start: window.start, end: window.end, step: 'SECOND' }), paging: { pageNum: Math.max(1, Math.round(body.page ?? 1)), - pageSize: clampPageSize(body.pageSize, 50), + pageSize: clampPageSize(body.pageSize, 50, deps.config.current.performance.limits.maxPageSize.browserLogs), }, }; diff --git a/apps/bff/src/http/query/dashboard.ts b/apps/bff/src/http/query/dashboard.ts index 42b3dd7..fed6c03 100644 --- a/apps/bff/src/http/query/dashboard.ts +++ b/apps/bff/src/http/query/dashboard.ts @@ -792,7 +792,7 @@ export function registerDashboardQueryRoute(app: FastifyInstance, deps: Dashboar // round-trip while staying inside OAP's per-query budget. // Gate-skipped widgets are excluded here (their wIdx keeps its // original index so Step 3's result map still lines up). - const MAX_WIDGETS_PER_BATCH = 6; + const MAX_WIDGETS_PER_BATCH = cfgCurrent.performance.bulk.dashboard.bulkSize; const batchWidgets = widgets .map((widget, wIdx) => ({ widget, wIdx })) .filter(({ wIdx }) => !skipped.has(wIdx)); diff --git a/apps/bff/src/http/query/deployment.ts b/apps/bff/src/http/query/deployment.ts index 34e65ec..b4461a7 100644 --- a/apps/bff/src/http/query/deployment.ts +++ b/apps/bff/src/http/query/deployment.ts @@ -284,6 +284,7 @@ export function registerDeploymentRoute( } const cfgCurrent = deps.config.current; + const perf = cfgCurrent.performance; const opts = buildOapOpts(cfgCurrent, deps.fetch); const offset = await getServerOffsetMinutes(deps.config, deps.fetch); // Honor the SPA's topbar picker triplet; else fall back to the @@ -507,8 +508,8 @@ export function registerDeploymentRoute( // track failed metric chunks → surface "blank may be unavailable, not zero" const mstats = { failed: 0, total: 0 }; const [nodeEnv, edgeEnv] = await Promise.all([ - fetchAliasedChunks<MqeShape>(opts, nodeFragments, 150, 'DeploymentNodeMetrics', 4, mstats), - fetchAliasedChunks<MqeShape>(opts, edgeFragments, 200, 'DeploymentEdgeMetrics', 4, mstats), + fetchAliasedChunks<MqeShape>(opts, nodeFragments, perf.bulk.topology.nodeBulkSize, 'DeploymentNodeMetrics', perf.bulk.topology.concurrency, mstats), + fetchAliasedChunks<MqeShape>(opts, edgeFragments, perf.bulk.topology.edgeBulkSize, 'DeploymentEdgeMetrics', perf.bulk.topology.concurrency, mstats), ]); for (const [alias, shape] of Object.entries(nodeEnv)) { diff --git a/apps/bff/src/http/query/endpoint-dependency.ts b/apps/bff/src/http/query/endpoint-dependency.ts index f2052da..6005911 100644 --- a/apps/bff/src/http/query/endpoint-dependency.ts +++ b/apps/bff/src/http/query/endpoint-dependency.ts @@ -287,6 +287,7 @@ export function registerEndpointDependencyRoute( } const cfgCurrent = deps.config.current; + const perf = cfgCurrent.performance; const opts = buildOapOpts(cfgCurrent, deps.fetch); const offset = await getServerOffsetMinutes(deps.config, deps.fetch); // Honor the SPA's topbar picker triplet; else fall back to the @@ -472,8 +473,8 @@ export function registerEndpointDependencyRoute( // track failed metric chunks → surface "blank may be unavailable, not zero" const mstats = { failed: 0, total: 0 }; const [nodeEnv, edgeEnv] = await Promise.all([ - fetchAliasedChunks<MqeShape>(opts, nodeFragments, 150, 'EndpointMetrics', 4, mstats), - fetchAliasedChunks<MqeShape>(opts, edgeFragments, 200, 'EndpointEdgeMetrics', 4, mstats), + fetchAliasedChunks<MqeShape>(opts, nodeFragments, perf.bulk.topology.nodeBulkSize, 'EndpointMetrics', perf.bulk.topology.concurrency, mstats), + fetchAliasedChunks<MqeShape>(opts, edgeFragments, perf.bulk.topology.edgeBulkSize, 'EndpointEdgeMetrics', perf.bulk.topology.concurrency, mstats), ]); for (const [alias, shape] of Object.entries(nodeEnv)) { diff --git a/apps/bff/src/http/query/instance-topology.ts b/apps/bff/src/http/query/instance-topology.ts index fb02f65..b940c5a 100644 --- a/apps/bff/src/http/query/instance-topology.ts +++ b/apps/bff/src/http/query/instance-topology.ts @@ -255,6 +255,7 @@ export function registerInstanceTopologyRoute( } const cfgCurrent = deps.config.current; + const perf = cfgCurrent.performance; const opts = buildOapOpts(cfgCurrent, deps.fetch); const offset = await getServerOffsetMinutes(deps.config, deps.fetch); // Honor the SPA's topbar picker triplet; else fall back to the @@ -386,8 +387,8 @@ export function registerInstanceTopologyRoute( // track failed metric chunks → surface "blank may be unavailable, not zero" const mstats = { failed: 0, total: 0 }; const [nodeEnv, edgeEnv] = await Promise.all([ - fetchAliasedChunks<MqeShape>(opts, nodeFragments, 150, 'InstanceNodeMetrics', 4, mstats), - fetchAliasedChunks<MqeShape>(opts, edgeFragments, 200, 'InstanceEdgeMetrics', 4, mstats), + fetchAliasedChunks<MqeShape>(opts, nodeFragments, perf.bulk.topology.nodeBulkSize, 'InstanceNodeMetrics', perf.bulk.topology.concurrency, mstats), + fetchAliasedChunks<MqeShape>(opts, edgeFragments, perf.bulk.topology.edgeBulkSize, 'InstanceEdgeMetrics', perf.bulk.topology.concurrency, mstats), ]); for (const [alias, shape] of Object.entries(nodeEnv)) { diff --git a/apps/bff/src/http/query/landing.ts b/apps/bff/src/http/query/landing.ts index 5f176e7..0341aaf 100644 --- a/apps/bff/src/http/query/landing.ts +++ b/apps/bff/src/http/query/landing.ts @@ -97,8 +97,8 @@ const DEFAULT_WINDOW_MIN = 60; // The batches then drain through a bounded-concurrency pool so a large // layer fans out in controlled waves, not a thundering herd. The number of // services probed per request is itself bounded by `query.landingServiceCap`. -const MAX_SERVICES_PER_BATCH = 6; -const LANDING_BATCH_CONCURRENCY = 8; +// Batch size + pool width are config-tunable via +// `performance.bulk.landing.{bulkSize,concurrency}` (read in the handler). /** Run `fn` over `items` with at most `limit` promises in flight at once. */ async function mapPool<T>(items: T[], limit: number, fn: (item: T) => Promise<void>): Promise<void> { @@ -265,6 +265,8 @@ export function registerLandingRoute(app: FastifyInstance, deps: LandingRouteDep const cfg = parsed.data; const oapLayer = layerKey.toUpperCase(); const cfgCurrent = deps.config.current; + const { bulkSize: maxServicesPerBatch, concurrency: batchConcurrency } = + cfgCurrent.performance.bulk.landing; const opts = buildOapOpts(cfgCurrent, deps.fetch); const offset = await getServerOffsetMinutes(deps.config, deps.fetch); // Honor the SPA's topbar time picker when all three triplet fields @@ -353,10 +355,10 @@ export function registerLandingRoute(app: FastifyInstance, deps: LandingRouteDep const out = new Map<string, MqeResultShape>(); if (svcList.length === 0 || !cols.some((c) => !!c.expression)) return out; const chunks: (typeof svcList)[] = []; - for (let i = 0; i < svcList.length; i += MAX_SERVICES_PER_BATCH) { - chunks.push(svcList.slice(i, i + MAX_SERVICES_PER_BATCH)); + for (let i = 0; i < svcList.length; i += maxServicesPerBatch) { + chunks.push(svcList.slice(i, i + maxServicesPerBatch)); } - await mapPool(chunks, LANDING_BATCH_CONCURRENCY, async (batch) => { + await mapPool(chunks, batchConcurrency, async (batch) => { const fragments: string[] = []; const back: { a: string; key: string }[] = []; batch.forEach((svc, li) => { diff --git a/apps/bff/src/http/query/log.ts b/apps/bff/src/http/query/log.ts index 0a9aa27..74cc6cf 100644 --- a/apps/bff/src/http/query/log.ts +++ b/apps/bff/src/http/query/log.ts @@ -53,12 +53,12 @@ export interface LogRouteDeps { const DEFAULT_WINDOW_MIN = 30; /** OAP feeds `paging.pageSize` straight to its storage layer as a - * LIMIT clause. The UI picker caps at 100; mirror that server-side so - * the cap holds against direct API callers. */ -const MAX_LOG_PAGE_SIZE = 100; -function clampPageSize(requested: number | undefined, fallback: number): number { + * LIMIT clause. The cap is `performance.limits.maxPageSize.logs` + * (default 100); mirror that server-side so the cap holds against + * direct API callers. */ +function clampPageSize(requested: number | undefined, fallback: number, max: number): number { if (!Number.isFinite(requested as number) || (requested as number) < 1) return fallback; - return Math.min(MAX_LOG_PAGE_SIZE, Math.round(requested as number)); + return Math.min(max, Math.round(requested as number)); } /** Build the log query window as SECOND-precision strings. Logs are @@ -223,7 +223,7 @@ export function registerLogRoute(app: FastifyInstance, deps: LogRouteDeps): void queryDuration: withColdStage(req, { start: window.start, end: window.end, step: 'SECOND' }), paging: { pageNum: Math.max(1, Math.round(body.page ?? 1)), - pageSize: clampPageSize(body.pageSize, 50), + pageSize: clampPageSize(body.pageSize, 50, deps.config.current.performance.limits.maxPageSize.logs), }, }; diff --git a/apps/bff/src/http/query/topology.ts b/apps/bff/src/http/query/topology.ts index a718b9d..8594d3a 100644 --- a/apps/bff/src/http/query/topology.ts +++ b/apps/bff/src/http/query/topology.ts @@ -204,11 +204,6 @@ export function seriesFromMqe(env: MqeShape | undefined): Array<number | null> | }); } -// Safety valve: above this the graph can't render legibly and risks OOMing the -// browser, so the route rejects with guidance rather than drawing a partial map. -const TOPOLOGY_MAX_NODES = 5000; -const TOPOLOGY_MAX_EDGES = 15000; - function emptyResponse( layerKey: string, serviceArg: string | null, @@ -307,6 +302,7 @@ export function registerTopologyRoute(app: FastifyInstance, deps: TopologyRouteD } const cfgCurrent = deps.config.current; + const perf = cfgCurrent.performance; const opts = buildOapOpts(cfgCurrent, deps.fetch); const offset = await getServerOffsetMinutes(deps.config, deps.fetch); // Honor the SPA's topbar time picker when all three triplet @@ -425,7 +421,10 @@ export function registerTopologyRoute(app: FastifyInstance, deps: TopologyRouteD // Reject-with-guidance instead of a partial graph: too large to draw // legibly + risks OOMing the browser. UI shows a narrow-scope hint. - if (nodes.size > TOPOLOGY_MAX_NODES || calls.size > TOPOLOGY_MAX_EDGES) { + if ( + nodes.size > perf.limits.topologyMaxNodes || + calls.size > perf.limits.topologyMaxEdges + ) { return reply.send({ ...emptyResponse(layerKey, serviceArg, depth, topoCfg, true), tooLarge: { nodes: nodes.size, edges: calls.size }, @@ -523,8 +522,8 @@ export function registerTopologyRoute(app: FastifyInstance, deps: TopologyRouteD // unavailable, not zero" rather than letting an OAP 5xx read as no-traffic. const mstats = { failed: 0, total: 0 }; const [nodeEnv, edgeEnv] = await Promise.all([ - fetchAliasedChunks<MqeShape>(opts, nodeFragments, 150, 'NodeMetrics', 4, mstats), - fetchAliasedChunks<MqeShape>(opts, edgeFragments, 200, 'EdgeMetrics', 4, mstats), + fetchAliasedChunks<MqeShape>(opts, nodeFragments, perf.bulk.topology.nodeBulkSize, 'NodeMetrics', perf.bulk.topology.concurrency, mstats), + fetchAliasedChunks<MqeShape>(opts, edgeFragments, perf.bulk.topology.edgeBulkSize, 'EdgeMetrics', perf.bulk.topology.concurrency, mstats), ]); for (const [alias, shape] of Object.entries(nodeEnv)) { diff --git a/apps/bff/src/http/query/trace.ts b/apps/bff/src/http/query/trace.ts index 9f03e0a..896cbb5 100644 --- a/apps/bff/src/http/query/trace.ts +++ b/apps/bff/src/http/query/trace.ts @@ -73,13 +73,13 @@ const DEFAULT_WINDOW_MIN = 30; const MAX_WINDOW_MIN = 60 * 24 * 7; // 1 week guard /** OAP feeds `paging.pageSize` straight to its storage layer as a * LIMIT clause (PaginationUtils.java). A direct API caller could - * otherwise pass `pageSize: 100000` and exhaust the backend. The UI - * picker caps at 200 — match that server-side, allowing graceful - * defaulting when the body omits or mangles the field. */ -const MAX_TRACE_PAGE_SIZE = 200; -function clampPageSize(requested: number | undefined, fallback: number): number { + * otherwise pass `pageSize: 100000` and exhaust the backend. The cap + * is `performance.limits.maxPageSize.traces` (default 100) — match the + * UI picker server-side, allowing graceful defaulting when the body + * omits or mangles the field. */ +function clampPageSize(requested: number | undefined, fallback: number, max: number): number { if (!Number.isFinite(requested as number) || (requested as number) < 1) return fallback; - return Math.min(MAX_TRACE_PAGE_SIZE, Math.round(requested as number)); + return Math.min(max, Math.round(requested as number)); } // Traces are RECORD-style data and have no metric-bucket cap on OAP // (`DurationUtils.MAX_TIME_RANGE` only applies to metric queries via @@ -267,6 +267,7 @@ function buildTraceCondition( resolvedServiceId: string | null, w: { start: string; end: string }, coldStage: boolean, + maxPageSize: number, ) { return { ...(resolvedServiceId ? { serviceId: resolvedServiceId } : {}), @@ -289,7 +290,7 @@ function buildTraceCondition( // OAP forwards `pageSize` straight to storage as a LIMIT // (PaginationUtils.java). The UI picker caps at 200; mirror that // server-side so the cap holds against direct API callers. - pageSize: clampPageSize(body.pageSize, 20), + pageSize: clampPageSize(body.pageSize, 20, maxPageSize), }, }; } @@ -300,6 +301,7 @@ async function fetchNativeList( layerKey: string, coldStage: boolean, offsetMinutes: number, + maxPageSize: number, ): Promise<NativeTraceListResponse> { const api = await detectTraceQueryApi(opts); // Explicit start+end takes precedence over windowMinutes; falling @@ -322,7 +324,7 @@ async function fetchNativeList( error: err instanceof Error ? err.message : String(err), }; } - const condition = buildTraceCondition(body, serviceId, window, coldStage); + const condition = buildTraceCondition(body, serviceId, window, coldStage, maxPageSize); try { if (api === 'queryTraces') { const env = await graphqlPost<{ @@ -383,13 +385,14 @@ async function fetchNativeList( async function fetchZipkinList( opts: GraphqlOptions, body: TraceListBody, + maxPageSize: number, ): Promise<ZipkinTraceListResponse> { try { const traces = await zipkinFetchTraces(opts, { serviceName: body.service, minDuration: body.minTraceDuration, maxDuration: body.maxTraceDuration, - limit: clampPageSize(body.pageSize, 20), + limit: clampPageSize(body.pageSize, 20, maxPageSize), }); return { source: 'zipkin', traces, reachable: true }; } catch (err) { @@ -434,6 +437,7 @@ export function registerTraceRoutes(app: FastifyInstance, deps: TraceRouteDeps): const requestedSource: TraceSource = body.source ?? tracesCfg.source; const opts = buildOapOpts(deps.config.current, deps.fetch); const offset = await getServerOffsetMinutes(deps.config, deps.fetch); + const maxPageSize = deps.config.current.performance.limits.maxPageSize.traces; const wantNative = requestedSource === 'both' || requestedSource === 'native'; const wantZipkin = requestedSource === 'both' || requestedSource === 'zipkin'; @@ -441,9 +445,9 @@ export function registerTraceRoutes(app: FastifyInstance, deps: TraceRouteDeps): // response — the UI's empty / error states cover each slot. const [native, zipkin] = await Promise.all([ wantNative - ? fetchNativeList(opts, body, layerKey, !!req.coldStage, offset) + ? fetchNativeList(opts, body, layerKey, !!req.coldStage, offset, maxPageSize) : Promise.resolve(undefined), - wantZipkin ? fetchZipkinList(opts, body) : Promise.resolve(undefined), + wantZipkin ? fetchZipkinList(opts, body, maxPageSize) : Promise.resolve(undefined), ]); const response: TraceListResponse = { diff --git a/apps/bff/src/logic/infra-3d/types.ts b/apps/bff/src/logic/infra-3d/types.ts index cc97479..eb93a93 100644 --- a/apps/bff/src/logic/infra-3d/types.ts +++ b/apps/bff/src/logic/infra-3d/types.ts @@ -116,19 +116,6 @@ export interface InfraEdgeStyle { arrow: boolean; } -export interface InfraPipelineLimits { - /** Service-bundles per MQE batch in stage 5. Mirrors the existing - * landing / dashboard chunking constant (6) so the 3D map shares the - * same OAP back-pressure profile. */ - metricChunkSize: number; - /** Max concurrent metric-chunk requests in stage 5 (each still ≤ metricChunkSize services). */ - metricConcurrency: number; - /** Max concurrent `getServicesTopology` calls in stage 3. */ - topologyConcurrency: number; - /** Max concurrent `getLayerTemplate` calls in stage 2. */ - templateConcurrency: number; -} - export interface Infra3dConfig { filter: { /** Global layer regex applied before levelling. Default `.*`. */ @@ -139,7 +126,6 @@ export interface Infra3dConfig { crossLevelCall: InfraEdgeStyle; intraCall: InfraEdgeStyle; }; - pipeline: InfraPipelineLimits; /** Where to put OAP layers that don't appear in any level's explicit * `layers` list and don't match any level's regex. The cube renders * with a small `badge` chip so the admin notices. */ diff --git a/apps/bff/src/logic/infra-3d/validate.ts b/apps/bff/src/logic/infra-3d/validate.ts index 6bf50e7..e6a61d2 100644 --- a/apps/bff/src/logic/infra-3d/validate.ts +++ b/apps/bff/src/logic/infra-3d/validate.ts @@ -105,19 +105,12 @@ const configSchema = z intraCall: edgeStyleSchema, }) .strict(), - pipeline: z - .object({ - // Cap matches the metrics route's MAX_SERVICES (infra-3d-metrics.ts): - // each metric chunk is one GraphQL request, and OAP's complexity - // ceiling 5xx's beyond 12 services. A larger chunk size makes every - // oversized request fail, so reject it at config-save time. - metricChunkSize: z.number().int().min(1).max(12), - // Concurrent chunks in flight (each still ≤ chunkSize); default 4 for older configs. - metricConcurrency: z.number().int().min(1).max(8).default(4), - topologyConcurrency: z.number().int().min(1).max(16), - templateConcurrency: z.number().int().min(1).max(32), - }) - .strict(), + // Deprecated + ignored. The metric fan-out budget moved to horizon.yaml + // (performance.bulk.infra3d) — the config endpoint injects the live value + // server-side. Accepted here (rather than rejected by `.strict()`) so a + // stale saved / imported row that still carries the block keeps loading; + // the value is unused. + pipeline: z.unknown().optional(), unknownLayer: z .object({ level: z.string().min(1), diff --git a/apps/ui/src/layer/browser-errors/LayerBrowserErrorsView.vue b/apps/ui/src/layer/browser-errors/LayerBrowserErrorsView.vue index 003e454..796b570 100644 --- a/apps/ui/src/layer/browser-errors/LayerBrowserErrorsView.vue +++ b/apps/ui/src/layer/browser-errors/LayerBrowserErrorsView.vue @@ -133,7 +133,7 @@ const endMsRef = computed<number | null>(() => ); const windowMinutesEffective = computed<number>(() => (isCustomRange.value ? 0 : windowMinutes.value)); const page = ref(1); -const pageSize = ref(100); +const pageSize = ref(30); // The query always pulls every category; the legend filters the stream // client-side (mirrors the Logs legend) so the chips can show full // per-category counts regardless of which one is selected. @@ -202,7 +202,7 @@ watch(serviceName, () => { selectedVersionId.value = ''; clearPage(); }); -watch([serviceName, windowMinutes, customStart, customEnd, selectedVersionId, selectedPageId], () => { +watch([serviceName, windowMinutes, customStart, customEnd, selectedVersionId, selectedPageId, pageSize], () => { page.value = 1; }); // Collapse the open row + its resolution whenever a fresh result set @@ -488,6 +488,15 @@ function loc(row: BrowserErrorRow): string { <option :value="CUSTOM_RANGE_SENTINEL">{{ t('Custom…') }}</option> </select> </label> + <label class="cf"> + <span>{{ t('Page size') }}</span> + <select v-model.number="pageSize" class="cf-input"> + <option :value="20">20</option> + <option :value="30">30</option> + <option :value="50">50</option> + <option :value="100">100</option> + </select> + </label> </div> <SourceMapManager v-if="showMaps" diff --git a/apps/ui/src/layer/logs/LayerLogsView.vue b/apps/ui/src/layer/logs/LayerLogsView.vue index a254a5c..38bc826 100644 --- a/apps/ui/src/layer/logs/LayerLogsView.vue +++ b/apps/ui/src/layer/logs/LayerLogsView.vue @@ -750,6 +750,7 @@ function jumpToTrace(traceId: string, ts?: number): void { <span>Page size</span> <select v-model.number="pageSize" class="cf-input"> <option :value="20">20</option> + <option :value="30">30</option> <option :value="50">50</option> <option :value="100">100</option> </select> diff --git a/apps/ui/src/layer/traces/LayerTracesView.vue b/apps/ui/src/layer/traces/LayerTracesView.vue index 89a4571..9865aa9 100644 --- a/apps/ui/src/layer/traces/LayerTracesView.vue +++ b/apps/ui/src/layer/traces/LayerTracesView.vue @@ -1020,11 +1020,10 @@ onBeforeUnmount(() => window.removeEventListener('keydown', onPageKeyDown, true) <label class="cf" :title="t('Cap on trace rows returned (default 30).')"> <span>{{ t('Limit') }}</span> <select v-model.number="limit" class="cf-input"> - <option :value="10">10</option> + <option :value="20">20</option> <option :value="30">30</option> <option :value="50">50</option> <option :value="100">100</option> - <option :value="200">200</option> </select> </label> <label class="cf" :class="{ 'cf-wide': isCustomRange }"> diff --git a/apps/ui/src/layer/traces/LayerZipkinTracesView.vue b/apps/ui/src/layer/traces/LayerZipkinTracesView.vue index c064b27..eb4792a 100644 --- a/apps/ui/src/layer/traces/LayerZipkinTracesView.vue +++ b/apps/ui/src/layer/traces/LayerZipkinTracesView.vue @@ -569,11 +569,10 @@ function openByInput(): void { <label class="cf"> <span>{{ t('Limit') }}</span> <select v-model.number="limit" class="cf-input"> - <option :value="10">10</option> + <option :value="20">20</option> <option :value="30">30</option> <option :value="50">50</option> <option :value="100">100</option> - <option :value="200">200</option> </select> </label> <!-- Time range pinned to its own final row so the (optional) diff --git a/docs/operate/infra-3d-map.md b/docs/operate/infra-3d-map.md index 92f0201..c43cb34 100644 --- a/docs/operate/infra-3d-map.md +++ b/docs/operate/infra-3d-map.md @@ -187,12 +187,12 @@ to publish. Import never writes OAP directly, and a file that isn't a valid ### Tuning the metric fan-out -The Metrics step loads each layer's traffic numbers in batches, several at once. How aggressively it does this is governed by a small `pipeline` block in the map configuration. These fields are **not** surfaced in the structured editor — they are tuned only by editing the exported configuration JSON and importing it back (or by hand-editing the bundled default before deploying): +The map's loading stages run in batches, several requests at once. How aggressively they do this is governed by the `performance.bulk.infra3d` block in [`horizon.yaml`](../setup/horizon-yaml.md#performance-tuning) — an operator setting, not part of the map configuration, so it is **not** in the structured editor and does **not** travel with an exported / imported map. Edit `horizon.yaml`; the change is hot-reloaded and takes effect the next time the map is opened: - `metricConcurrency` — how many metric batches load at the same time. Default `4`, range `1`–`8`. Raise it to fill the cubes faster on a large deployment when OAP has headroom; lower it (toward `1`) if a busy OAP rejects or slows the burst of metric requests during the Metrics step. -- `metricChunkSize` — how many services share one metric request. Range `1`–`12`. Larger chunks mean fewer requests, but OAP rejects an oversized request, so this is capped — leave it at the default unless you have a reason to change it. -- `topologyConcurrency` — how many layer call-graphs load at once during the Topologies step. Range `1`–`16`. -- `templateConcurrency` — how many layer templates load at once during the Templates step. Range `1`–`32`. +- `metricBulkSize` — how many services share one metric request. Default `6`, range `1`–`12`. Larger means fewer requests, but OAP rejects an oversized request, so this is capped — leave it at the default unless you have a reason to change it. +- `topologyConcurrency` — how many layer call-graphs load at once during the Topologies step. Default `4`, range `1`–`16`. +- `templateConcurrency` — how many layer templates load at once during the Templates step. Default `8`, range `1`–`32`. The defaults are tuned for a typical deployment; only revisit these if the loading timeline stalls on the Metrics, Topologies, or Templates step, or if OAP returns errors under the load. diff --git a/docs/setup/container-image.md b/docs/setup/container-image.md index a2f4921..7053553 100644 --- a/docs/setup/container-image.md +++ b/docs/setup/container-image.md @@ -54,6 +54,21 @@ The four `HORIZON_*_FILE` env vars seed the **defaults** the config schema uses `server.host` and `server.port` come from the YAML when present. If they are omitted, the image supplies defaults via `HORIZON_SERVER_HOST=0.0.0.0` and `HORIZON_SERVER_PORT=8081`. The image sets `EXPOSE 8081`; if you change `server.port`, also publish the new port. +## Memory & sizing + +The BFF holds its **source-map cache in the Node heap** — uploaded Browser-Logs maps live in process memory, not in OAP — so the container's memory limit and Node's heap limit must be sized together with the source-map budget. + +- Set **`NODE_OPTIONS=--max-old-space-size=<MB>`** to match the container memory limit (leave headroom for the rest of the process — a value somewhat below the container limit, e.g. `1536` for a 2 GiB container). `--max-old-space-size` is a **process flag read by V8 before any config loads**, so it is **not** a `horizon.yaml` field — pass it via `NODE_OPTIONS` (env), not in the YAML. +- Size **`sourceMaps.maxTotalBytes`** to fit comfortably inside that heap. A few recently-resolved maps are also kept *parsed* (larger than the raw file), so budget roughly 2× headroom above `maxTotalBytes`. Mounted (static) maps are disk-backed and don't count against the heap. See [Browser Logs & Source Maps](../operate/browser-source-maps.md). + +```sh +docker run -d --name horizon \ + -p 8081:8081 \ + -e NODE_OPTIONS=--max-old-space-size=1536 \ + -v "$PWD/horizon.yaml:/app/horizon.yaml:ro" \ + ghcr.io/apache/skywalking-horizon-ui:0.7.0 +``` + ## How to load `horizon.yaml` into the container Three common approaches. diff --git a/docs/setup/horizon-yaml.md b/docs/setup/horizon-yaml.md index 99a208c..7d83769 100644 --- a/docs/setup/horizon-yaml.md +++ b/docs/setup/horizon-yaml.md @@ -16,6 +16,7 @@ This page is the top-level map. Each subsection has its own detail page: | `debugLog` | Wire-level request/response log for troubleshooting. | [debugLog](debug-log.md) | | `query` | Per-request query limits (the layer-landing service cap). | [below](#query-limits) | | `sourceMaps` | In-memory source-map budgets + static mount for the Browser Logs tab. | [Browser Logs & Source Maps](../operate/browser-source-maps.md) | +| `performance` | How hard the BFF fans queries out to OAP, plus render / per-request record caps. | [below](#performance-tuning) | | `layers` | Layers to hide from the sidebar. | [below](#excluded-layers) | ## Top-level shape @@ -48,6 +49,18 @@ setup: { file? } alarms: { file? } debugLog: { enabled?, file?, maxBodyChars?, redactAuthHeaders? } sourceMaps: { enabled?, maxFileBytes?, maxTotalBytes?, maxFileCount?, bootMountDir? } + +performance: + bulk: + topology: { nodeBulkSize?, edgeBulkSize?, concurrency? } + infra3d: { metricBulkSize?, metricConcurrency?, topologyConcurrency?, templateConcurrency? } + landing: { bulkSize?, concurrency? } + dashboard: { bulkSize? } + limits: + topologyMaxNodes?: number + topologyMaxEdges?: number + maxPageSize: { traces?, logs?, browserLogs? } + layers: { excluded?: [{ key, reason? }] } ``` @@ -135,6 +148,53 @@ cap and pair it with a tighter OAP rate limit. Hot-reloadable — a change takes effect on the next landing request. +## Performance tuning + +```yaml +performance: + bulk: + topology: { nodeBulkSize: 150, edgeBulkSize: 200, concurrency: 4 } + infra3d: { metricBulkSize: 6, metricConcurrency: 4, topologyConcurrency: 4, templateConcurrency: 8 } + landing: { bulkSize: 6, concurrency: 8 } + dashboard: { bulkSize: 6 } + limits: + topologyMaxNodes: 5000 + topologyMaxEdges: 15000 + maxPageSize: { traces: 100, logs: 100, browserLogs: 100 } +``` + +The `performance` block tunes how hard Horizon drives your OAP and storage backend. **Every default equals the built-in value, so the whole block is optional** — omit it and Horizon behaves exactly as it does without it. Every value is also **clamped to a hard ceiling**: a number above the ceiling is pulled back down to it (config can only lower the load below a built-in limit, never raise it past one). Hot-reloadable — a change takes effect on the next request of that kind. + +The rule of thumb: **raise these on a beefy OAP with a fast storage backend** that can absorb more parallel queries (you'll fill pages and maps faster); **lower them on a modest deployment** where a busy OAP rejects or slows under the burst. + +### `performance.bulk` — query fan-out + +These govern how Horizon batches and parallelizes its metric queries to OAP. Each family has a **bulk size** (how many metric expressions ride in one OAP request — fewer, larger requests vs. more, smaller ones) and most have a **concurrency** (how many of those requests are in flight at once). + +| Section | Tunes | Defaults | +|---|---|---| +| `bulk.topology` | The service-map family (topology, instance topology, deployment, endpoint dependency) node/edge metric fan-out. | `nodeBulkSize: 150`, `edgeBulkSize: 200`, `concurrency: 4` | +| `bulk.infra3d` | The 3D Infrastructure Map's metric, topology, and template loading. | `metricBulkSize: 6`, `metricConcurrency: 4`, `topologyConcurrency: 4`, `templateConcurrency: 8` | +| `bulk.landing` | The per-layer landing's service-column metric batches. | `bulkSize: 6`, `concurrency: 8` | +| `bulk.dashboard` | A dashboard's widget metric fan-out. | `bulkSize: 6` | + +- **Raise `concurrency` / `*Concurrency`** to load a large topology, 3D map, landing, or dashboard faster when OAP has headroom. **Lower it** (toward `1`) if OAP rejects or slows under the burst of parallel requests. +- **Bulk sizes** trade request count against request size: a larger bulk means fewer, fatter OAP requests. OAP rejects an oversized request, so each bulk size is capped — leave it at the default unless you have a specific reason to change it. +- For the 3D map specifically, these knobs are also described in context on the [3D Infrastructure Map](../operate/infra-3d-map.md) page. + +### `performance.limits` — render & record caps + +| Field | Caps | Default | +|---|---|---| +| `topologyMaxNodes` | The render valve for a service map — a graph with more nodes than this is **rejected with a "narrow the scope" notice** rather than drawn as an unreadable hairball. | `5000` | +| `topologyMaxEdges` | The same valve on edges. | `15000` | +| `maxPageSize.traces` | The maximum **records** fetched per Traces request (the storage `LIMIT`, not a page count). The page-size picker on the page maxes at this same value, so a client can't out-ask the dropdown. | `100` | +| `maxPageSize.logs` | The same per-request record cap for Logs. | `100` | +| `maxPageSize.browserLogs` | The same per-request record cap for Browser Logs. | `100` | + +- **`topologyMaxNodes` / `topologyMaxEdges`** are a readability and safety valve, not a data limit — if your deployment legitimately has a graph this large, raising them lets it render (at the cost of a denser scene and a heavier draw). Lower them if you'd rather force operators to scope down sooner. +- **`maxPageSize.*`** bound how many rows one Traces / Logs / Browser-Logs request pulls from storage. Some storage backends fail or slow on large list queries — lower these to keep list pages cheap on a constrained backend; raise them (up to the ceiling) if your backend serves big result sets comfortably and operators want more rows per fetch. + ## Excluded layers ```yaml diff --git a/horizon.example.yaml b/horizon.example.yaml index 1c9d8a3..7bdd2ef 100644 --- a/horizon.example.yaml +++ b/horizon.example.yaml @@ -148,6 +148,7 @@ rbac: - topology:read - profile:read - overview:read + - infra-3d:read # Viewer + platform monitoring (OAP cluster + module inspector). maintainer: @@ -160,9 +161,10 @@ rbac: - profile:read - overview:read - cluster:read + - inspect:read - ttl:read - config:read - - inspect:read + - infra-3d:read # Configures observability: dashboards, alarm rules, DSL/OAL, # diagnostics, profiling. Inherits viewer + platform reads so the @@ -177,9 +179,9 @@ rbac: - topology:read - profile:read - cluster:read + - inspect:read - ttl:read - config:read - - inspect:read - overview:read - overview:write - setup:read @@ -190,6 +192,7 @@ rbac: - alarm-setup:write - alarm-rule:read - alarm-rule:write + - infra-3d:read - rule:read - rule:write - rule:write:structural @@ -247,3 +250,36 @@ sourceMaps: # published image sets HORIZON_SOURCEMAPS_DIR=/app/sourcemaps; leave empty # to disable the static mount. bootMountDir: ${HORIZON_SOURCEMAPS_DIR:} + +# ──────────────────────────────────────────────────────────────────── +# Performance / behavior tuning — how hard the BFF fans queries out to +# OAP, and the caps that protect storage. OPERATIONAL (per-deployment, +# hot-reloaded, never published to OAP), unlike dashboard content, which +# lives in templates. The whole block is optional — defaults equal the +# built-in values, shown here for reference. Every value is clamped to a +# hard ceiling; config can lower it, never raise it past that. +# +# Node heap: the BFF holds the source-map cache (above) in process memory, +# so size the container memory limit and `NODE_OPTIONS=--max-old-space-size` +# to your sourceMaps budget. (--max-old-space-size is a process flag, not a +# config field — V8 reads it before this file loads.) +performance: + bulk: + # Service-map family (topology / instance-topology / deployment / + # endpoint-dependency): bulkSize = aliased MQE fragments per OAP + # request; concurrency = parallel requests. + topology: { nodeBulkSize: 150, edgeBulkSize: 200, concurrency: 4 } + # 3D infrastructure-map metric fan-out (was the 3D template `pipeline`). + infra3d: { metricBulkSize: 6, metricConcurrency: 4, topologyConcurrency: 4, templateConcurrency: 8 } + # Per-layer landing metric-column batches. + landing: { bulkSize: 6, concurrency: 8 } + # Dashboard widget metric fan-out. + dashboard: { bulkSize: 6 } + limits: + # Service-map render valve — a larger graph is rejected with a + # "narrow the scope" notice rather than drawn unreadably. + topologyMaxNodes: 5000 + topologyMaxEdges: 15000 + # Max RECORDS per request for each event list (the OAP storage LIMIT) + # — not a page count; the UI picker maxes at the same value. + maxPageSize: { traces: 100, logs: 100, browserLogs: 100 }
