henrikingo commented on code in PR #146: URL: https://github.com/apache/otava/pull/146#discussion_r3034134344
########## docs/JSON.md: ########## @@ -0,0 +1,136 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# JSON Data Source + +> **Tip** +> See [examples/](../examples/) for sample configuration files. + +## Overview + +`JsonImporter` reads benchmark results from a local JSON file and feeds them into Otava for change-point analysis. It is a simple data source to set up — no external database or service is required. + +The importer caches parsed file content in memory, so a file is only read once per session even if multiple tests reference the same path. + +--- + +## Expected JSON Format + +The input file must be a JSON array. Each element represents a single benchmark run. +```json +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 4821.0 }, + { "name": "p99_latency_ms", "value": 142.7 } + ], + "attributes": { + "branch": "main", + "commit": "a3f9c12" + } + }, + { + "timestamp": 1712016000, + "metrics": [ + { "name": "throughput", "value": 5013.0 }, + { "name": "p99_latency_ms", "value": 138.2 } + ], + "attributes": { + "branch": "main", + "commit": "b7d2e45" + } + } +] +``` + +--- + +## Fields + +### `timestamp` + +- **Type:** integer (Unix epoch seconds) +- **Required:** yes +- Identifies when the benchmark run occurred. Used for time-range filtering via `DataSelector`. + +### `metrics` + +- **Type:** array of objects +- **Required:** yes +- Each object must have: + - `name` (string) — unique identifier for the metric within this run + - `value` (number) — the measured value +- Metric names are collected dynamically across all entries in the file. Names must be consistent across runs for change-point analysis to be meaningful. + +### `attributes` + +- **Type:** object (string → string) +- **Required:** yes if `branch` filtering is used +- Arbitrary key-value pairs describing the run context (e.g. branch, commit, version). +- The `branch` key is treated specially: if a branch is specified via `DataSelector` or `base_branch` in the config, only runs where `attributes["branch"]` matches that value are included. + +--- + +## Configuration Example + +Add a test with `type: json` to your `otava.yaml`: +```yaml +tests: + my_benchmark: + type: json + file: otava/examples/json/data/sample.json + base_branch: main +``` + +| Field | Required | Description | +|---|---|---| +| `type` | yes | Must be `json` | +| `file` | yes | Path to the JSON file | +| `base_branch` | no | If set, only runs from this branch are analyzed by default | + +--- + +## Behavior + +- **File loading:** The file is read in full when first accessed. Parsed content is cached in memory for the lifetime of the session — repeated calls with the same file path do not re-read from disk. +- **Metric discovery:** All metric names are collected by scanning every entry in the file. The resulting set is unordered. +- **Attribute discovery:** Attribute keys are collected the same way — by scanning all entries. +- **Branch filtering:** If `selector.branch` is set, only runs where `attributes["branch"]` equals that value are included. If not set but `base_branch` is configured, that value is used instead. If neither is set, all runs are included. +- **Metric filtering:** If `selector.metrics` is set, only metrics whose names appear in that list are included. Others are silently skipped. Review Comment: Note, selector.branch and selector.metrics are things internal to the code. For end user documentation, you should talk about the CLI options --branch and --selector. I think this entire paragraph could be removed. It describes internal behavior, or, when it doesn't, it describes options that are common across all importers and not specific to JsonImporter. ########## docs/JSON.md: ########## @@ -0,0 +1,136 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# JSON Data Source + +> **Tip** +> See [examples/](../examples/) for sample configuration files. + +## Overview + +`JsonImporter` reads benchmark results from a local JSON file and feeds them into Otava for change-point analysis. It is a simple data source to set up — no external database or service is required. + +The importer caches parsed file content in memory, so a file is only read once per session even if multiple tests reference the same path. + +--- + +## Expected JSON Format + +The input file must be a JSON array. Each element represents a single benchmark run. +```json +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 4821.0 }, + { "name": "p99_latency_ms", "value": 142.7 } + ], + "attributes": { + "branch": "main", + "commit": "a3f9c12" + } + }, + { + "timestamp": 1712016000, + "metrics": [ + { "name": "throughput", "value": 5013.0 }, + { "name": "p99_latency_ms", "value": 138.2 } + ], + "attributes": { + "branch": "main", + "commit": "b7d2e45" + } + } +] +``` + +--- + +## Fields + +### `timestamp` + +- **Type:** integer (Unix epoch seconds) +- **Required:** yes +- Identifies when the benchmark run occurred. Used for time-range filtering via `DataSelector`. + +### `metrics` + +- **Type:** array of objects +- **Required:** yes +- Each object must have: + - `name` (string) — unique identifier for the metric within this run + - `value` (number) — the measured value Review Comment: The Series class that later handles this data, would apparently also support a unit field. (e.g. "ms"). But unfortunately JsonImporter doesn't support that. We should add it in the future. You could add here a note that unfortunately `unit` key isn't currently supported. ########## docs/JSON.md: ########## @@ -0,0 +1,136 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# JSON Data Source + +> **Tip** +> See [examples/](../examples/) for sample configuration files. + +## Overview + +`JsonImporter` reads benchmark results from a local JSON file and feeds them into Otava for change-point analysis. It is a simple data source to set up — no external database or service is required. + +The importer caches parsed file content in memory, so a file is only read once per session even if multiple tests reference the same path. + +--- + +## Expected JSON Format + +The input file must be a JSON array. Each element represents a single benchmark run. +```json +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 4821.0 }, + { "name": "p99_latency_ms", "value": 142.7 } + ], + "attributes": { + "branch": "main", + "commit": "a3f9c12" + } + }, + { + "timestamp": 1712016000, + "metrics": [ + { "name": "throughput", "value": 5013.0 }, + { "name": "p99_latency_ms", "value": 138.2 } + ], + "attributes": { + "branch": "main", + "commit": "b7d2e45" + } + } +] +``` + +--- + +## Fields + +### `timestamp` + +- **Type:** integer (Unix epoch seconds) +- **Required:** yes +- Identifies when the benchmark run occurred. Used for time-range filtering via `DataSelector`. + +### `metrics` + +- **Type:** array of objects +- **Required:** yes +- Each object must have: + - `name` (string) — unique identifier for the metric within this run + - `value` (number) — the measured value +- Metric names are collected dynamically across all entries in the file. Names must be consistent across runs for change-point analysis to be meaningful. Review Comment: I suggest removing the first sentence. It sound like it is describing the intrenals of how our code works. But this is a user manual, they don't need to understand that. The second sentence can stay though. ########## docs/JSON.md: ########## @@ -0,0 +1,136 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# JSON Data Source + +> **Tip** +> See [examples/](../examples/) for sample configuration files. + +## Overview + +`JsonImporter` reads benchmark results from a local JSON file and feeds them into Otava for change-point analysis. It is a simple data source to set up — no external database or service is required. + +The importer caches parsed file content in memory, so a file is only read once per session even if multiple tests reference the same path. + +--- + +## Expected JSON Format + +The input file must be a JSON array. Each element represents a single benchmark run. +```json +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 4821.0 }, + { "name": "p99_latency_ms", "value": 142.7 } + ], + "attributes": { + "branch": "main", + "commit": "a3f9c12" + } + }, + { + "timestamp": 1712016000, + "metrics": [ + { "name": "throughput", "value": 5013.0 }, + { "name": "p99_latency_ms", "value": 138.2 } + ], + "attributes": { + "branch": "main", + "commit": "b7d2e45" + } + } +] +``` + +--- + +## Fields + +### `timestamp` + +- **Type:** integer (Unix epoch seconds) +- **Required:** yes +- Identifies when the benchmark run occurred. Used for time-range filtering via `DataSelector`. + +### `metrics` + +- **Type:** array of objects +- **Required:** yes +- Each object must have: + - `name` (string) — unique identifier for the metric within this run + - `value` (number) — the measured value +- Metric names are collected dynamically across all entries in the file. Names must be consistent across runs for change-point analysis to be meaningful. + +### `attributes` + +- **Type:** object (string → string) +- **Required:** yes if `branch` filtering is used +- Arbitrary key-value pairs describing the run context (e.g. branch, commit, version). +- The `branch` key is treated specially: if a branch is specified via `DataSelector` or `base_branch` in the config, only runs where `attributes["branch"]` matches that value are included. + +--- + +## Configuration Example + +Add a test with `type: json` to your `otava.yaml`: +```yaml +tests: + my_benchmark: + type: json + file: otava/examples/json/data/sample.json + base_branch: main +``` + +| Field | Required | Description | +|---|---|---| +| `type` | yes | Must be `json` | +| `file` | yes | Path to the JSON file | +| `base_branch` | no | If set, only runs from this branch are analyzed by default | + +--- + +## Behavior + +- **File loading:** The file is read in full when first accessed. Parsed content is cached in memory for the lifetime of the session — repeated calls with the same file path do not re-read from disk. +- **Metric discovery:** All metric names are collected by scanning every entry in the file. The resulting set is unordered. +- **Attribute discovery:** Attribute keys are collected the same way — by scanning all entries. +- **Branch filtering:** If `selector.branch` is set, only runs where `attributes["branch"]` equals that value are included. If not set but `base_branch` is configured, that value is used instead. If neither is set, all runs are included. +- **Metric filtering:** If `selector.metrics` is set, only metrics whose names appear in that list are included. Others are silently skipped. +- **Time filtering:** Entries outside `selector.since_time` / `selector.until_time` are excluded. An invalid range (since > until) raises an error. +- **Truncation:** After filtering, only the last `selector.last_n_points` entries are kept for time, data, and attributes. + +--- + +## Limitations + +- The entire file is read into memory at once. Very large files may cause high memory usage. +- There is no schema validation. Missing or malformed fields will cause a `KeyError` at runtime. +- The `branch` filter requires the key `"branch"` to exist inside `attributes` on every entry — if it is absent on any entry that would otherwise be included, the importer will raise a `KeyError`. +- Attribute values are expected to be strings. No type coercion is performed. +- The file path is resolved at config load time; a missing file raises a `TestConfigError` immediately. + +--- + +## Example Usage + +Run analysis on a test backed by a JSON file: Review Comment: Suggest to change: "Analyze test results stored in JSON format." ########## docs/JSON.md: ########## @@ -0,0 +1,136 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# JSON Data Source + +> **Tip** +> See [examples/](../examples/) for sample configuration files. + +## Overview + +`JsonImporter` reads benchmark results from a local JSON file and feeds them into Otava for change-point analysis. It is a simple data source to set up — no external database or service is required. + +The importer caches parsed file content in memory, so a file is only read once per session even if multiple tests reference the same path. + +--- + +## Expected JSON Format + +The input file must be a JSON array. Each element represents a single benchmark run. +```json +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 4821.0 }, + { "name": "p99_latency_ms", "value": 142.7 } + ], + "attributes": { + "branch": "main", + "commit": "a3f9c12" + } + }, + { + "timestamp": 1712016000, + "metrics": [ + { "name": "throughput", "value": 5013.0 }, + { "name": "p99_latency_ms", "value": 138.2 } + ], + "attributes": { + "branch": "main", + "commit": "b7d2e45" + } + } +] +``` + +--- + +## Fields + +### `timestamp` + +- **Type:** integer (Unix epoch seconds) +- **Required:** yes +- Identifies when the benchmark run occurred. Used for time-range filtering via `DataSelector`. + +### `metrics` + +- **Type:** array of objects +- **Required:** yes +- Each object must have: + - `name` (string) — unique identifier for the metric within this run + - `value` (number) — the measured value +- Metric names are collected dynamically across all entries in the file. Names must be consistent across runs for change-point analysis to be meaningful. + +### `attributes` + +- **Type:** object (string → string) +- **Required:** yes if `branch` filtering is used Review Comment: => So really you are saying Required: no ########## otava/examples/json/config/otava.yaml: ########## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +tests: + my_benchmark: + type: json + file: test_data/sample.json Review Comment: This needs updating. The user should be able to try the provided example and it should work. ########## otava/examples/json/data/sample.json: ########## @@ -0,0 +1,11 @@ +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 100 } + ], + "attributes": { + "branch": "main" + } + } Review Comment: You'd better add more than one data point if you want to demo change point detection :-) ########## docs/JSON.md: ########## @@ -0,0 +1,136 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# JSON Data Source + +> **Tip** +> See [examples/](../examples/) for sample configuration files. + +## Overview + +`JsonImporter` reads benchmark results from a local JSON file and feeds them into Otava for change-point analysis. It is a simple data source to set up — no external database or service is required. + +The importer caches parsed file content in memory, so a file is only read once per session even if multiple tests reference the same path. + +--- + +## Expected JSON Format + +The input file must be a JSON array. Each element represents a single benchmark run. +```json +[ + { + "timestamp": 1711929600, + "metrics": [ + { "name": "throughput", "value": 4821.0 }, + { "name": "p99_latency_ms", "value": 142.7 } + ], + "attributes": { + "branch": "main", + "commit": "a3f9c12" + } + }, + { + "timestamp": 1712016000, + "metrics": [ + { "name": "throughput", "value": 5013.0 }, + { "name": "p99_latency_ms", "value": 138.2 } + ], + "attributes": { + "branch": "main", + "commit": "b7d2e45" + } + } +] +``` + +--- + +## Fields + +### `timestamp` + +- **Type:** integer (Unix epoch seconds) +- **Required:** yes +- Identifies when the benchmark run occurred. Used for time-range filtering via `DataSelector`. + +### `metrics` + +- **Type:** array of objects +- **Required:** yes +- Each object must have: + - `name` (string) — unique identifier for the metric within this run + - `value` (number) — the measured value +- Metric names are collected dynamically across all entries in the file. Names must be consistent across runs for change-point analysis to be meaningful. + +### `attributes` + +- **Type:** object (string → string) +- **Required:** yes if `branch` filtering is used +- Arbitrary key-value pairs describing the run context (e.g. branch, commit, version). +- The `branch` key is treated specially: if a branch is specified via `DataSelector` or `base_branch` in the config, only runs where `attributes["branch"]` matches that value are included. + +--- + +## Configuration Example + +Add a test with `type: json` to your `otava.yaml`: +```yaml +tests: + my_benchmark: + type: json + file: otava/examples/json/data/sample.json + base_branch: main +``` + +| Field | Required | Description | +|---|---|---| +| `type` | yes | Must be `json` | +| `file` | yes | Path to the JSON file | +| `base_branch` | no | If set, only runs from this branch are analyzed by default | Review Comment: Change "runs" to "results" Both are correct, but "results" it what we generally talk about. (aka test results, ...) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
