Re: [PR] Add documentation for JsonImporter [otava]

via GitHub Fri, 03 Apr 2026 15:19:43 -0700


henrikingo commented on code in PR #146:
URL: https://github.com/apache/otava/pull/146#discussion_r3034134344



##########
docs/JSON.md:
##########
@@ -0,0 +1,136 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# JSON Data Source
+
+> **Tip**
+> See [examples/](../examples/) for sample configuration files.
+
+## Overview
+
+`JsonImporter` reads benchmark results from a local JSON file and feeds them 
into Otava for change-point analysis. It is a simple data source to set up — no 
external database or service is required.
+
+The importer caches parsed file content in memory, so a file is only read once 
per session even if multiple tests reference the same path.
+
+---
+
+## Expected JSON Format
+
+The input file must be a JSON array. Each element represents a single 
benchmark run.
+```json
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 4821.0 },
+      { "name": "p99_latency_ms", "value": 142.7 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "a3f9c12"
+    }
+  },
+  {
+    "timestamp": 1712016000,
+    "metrics": [
+      { "name": "throughput", "value": 5013.0 },
+      { "name": "p99_latency_ms", "value": 138.2 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "b7d2e45"
+    }
+  }
+]
+```
+
+---
+
+## Fields
+
+### `timestamp`
+
+- **Type:** integer (Unix epoch seconds)
+- **Required:** yes
+- Identifies when the benchmark run occurred. Used for time-range filtering 
via `DataSelector`.
+
+### `metrics`
+
+- **Type:** array of objects
+- **Required:** yes
+- Each object must have:
+  - `name` (string) — unique identifier for the metric within this run
+  - `value` (number) — the measured value
+- Metric names are collected dynamically across all entries in the file. Names 
must be consistent across runs for change-point analysis to be meaningful.
+
+### `attributes`
+
+- **Type:** object (string → string)
+- **Required:** yes if `branch` filtering is used
+- Arbitrary key-value pairs describing the run context (e.g. branch, commit, 
version).
+- The `branch` key is treated specially: if a branch is specified via 
`DataSelector` or `base_branch` in the config, only runs where 
`attributes["branch"]` matches that value are included.
+
+---
+
+## Configuration Example
+
+Add a test with `type: json` to your `otava.yaml`:
+```yaml
+tests:
+  my_benchmark:
+    type: json
+    file: otava/examples/json/data/sample.json
+    base_branch: main
+```
+
+| Field | Required | Description |
+|---|---|---|
+| `type` | yes | Must be `json` |
+| `file` | yes |  Path to the JSON file |
+| `base_branch` | no | If set, only runs from this branch are analyzed by 
default |
+
+---
+
+## Behavior
+
+- **File loading:** The file is read in full when first accessed. Parsed 
content is cached in memory for the lifetime of the session — repeated calls 
with the same file path do not re-read from disk.
+- **Metric discovery:** All metric names are collected by scanning every entry 
in the file. The resulting set is unordered.
+- **Attribute discovery:** Attribute keys are collected the same way — by 
scanning all entries.
+- **Branch filtering:** If `selector.branch` is set, only runs where 
`attributes["branch"]` equals that value are included. If not set but 
`base_branch` is configured, that value is used instead. If neither is set, all 
runs are included.
+- **Metric filtering:** If `selector.metrics` is set, only metrics whose names 
appear in that list are included. Others are silently skipped.

Review Comment:
   Note, selector.branch and selector.metrics are things internal to the code. 
For end user documentation, you should talk about the CLI options --branch  and 
--selector.
   
   I think this entire paragraph could be removed. It describes internal 
behavior, or, when it doesn't, it describes options that are common across all 
importers and not specific to JsonImporter.



##########
docs/JSON.md:
##########
@@ -0,0 +1,136 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# JSON Data Source
+
+> **Tip**
+> See [examples/](../examples/) for sample configuration files.
+
+## Overview
+
+`JsonImporter` reads benchmark results from a local JSON file and feeds them 
into Otava for change-point analysis. It is a simple data source to set up — no 
external database or service is required.
+
+The importer caches parsed file content in memory, so a file is only read once 
per session even if multiple tests reference the same path.
+
+---
+
+## Expected JSON Format
+
+The input file must be a JSON array. Each element represents a single 
benchmark run.
+```json
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 4821.0 },
+      { "name": "p99_latency_ms", "value": 142.7 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "a3f9c12"
+    }
+  },
+  {
+    "timestamp": 1712016000,
+    "metrics": [
+      { "name": "throughput", "value": 5013.0 },
+      { "name": "p99_latency_ms", "value": 138.2 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "b7d2e45"
+    }
+  }
+]
+```
+
+---
+
+## Fields
+
+### `timestamp`
+
+- **Type:** integer (Unix epoch seconds)
+- **Required:** yes
+- Identifies when the benchmark run occurred. Used for time-range filtering 
via `DataSelector`.
+
+### `metrics`
+
+- **Type:** array of objects
+- **Required:** yes
+- Each object must have:
+  - `name` (string) — unique identifier for the metric within this run
+  - `value` (number) — the measured value

Review Comment:
   The Series class that later handles this data, would apparently also support 
a unit field. (e.g. "ms"). But unfortunately JsonImporter doesn't support that. 
We should add it in the future.
   
   You could add here a note that unfortunately `unit` key isn't currently 
supported.



##########
docs/JSON.md:
##########
@@ -0,0 +1,136 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# JSON Data Source
+
+> **Tip**
+> See [examples/](../examples/) for sample configuration files.
+
+## Overview
+
+`JsonImporter` reads benchmark results from a local JSON file and feeds them 
into Otava for change-point analysis. It is a simple data source to set up — no 
external database or service is required.
+
+The importer caches parsed file content in memory, so a file is only read once 
per session even if multiple tests reference the same path.
+
+---
+
+## Expected JSON Format
+
+The input file must be a JSON array. Each element represents a single 
benchmark run.
+```json
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 4821.0 },
+      { "name": "p99_latency_ms", "value": 142.7 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "a3f9c12"
+    }
+  },
+  {
+    "timestamp": 1712016000,
+    "metrics": [
+      { "name": "throughput", "value": 5013.0 },
+      { "name": "p99_latency_ms", "value": 138.2 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "b7d2e45"
+    }
+  }
+]
+```
+
+---
+
+## Fields
+
+### `timestamp`
+
+- **Type:** integer (Unix epoch seconds)
+- **Required:** yes
+- Identifies when the benchmark run occurred. Used for time-range filtering 
via `DataSelector`.
+
+### `metrics`
+
+- **Type:** array of objects
+- **Required:** yes
+- Each object must have:
+  - `name` (string) — unique identifier for the metric within this run
+  - `value` (number) — the measured value
+- Metric names are collected dynamically across all entries in the file. Names 
must be consistent across runs for change-point analysis to be meaningful.

Review Comment:
   I suggest removing the first sentence. It sound like it is describing the 
intrenals of how our code works. But this is a user manual, they don't need to 
understand that. The second sentence can stay though.



##########
docs/JSON.md:
##########
@@ -0,0 +1,136 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# JSON Data Source
+
+> **Tip**
+> See [examples/](../examples/) for sample configuration files.
+
+## Overview
+
+`JsonImporter` reads benchmark results from a local JSON file and feeds them 
into Otava for change-point analysis. It is a simple data source to set up — no 
external database or service is required.
+
+The importer caches parsed file content in memory, so a file is only read once 
per session even if multiple tests reference the same path.
+
+---
+
+## Expected JSON Format
+
+The input file must be a JSON array. Each element represents a single 
benchmark run.
+```json
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 4821.0 },
+      { "name": "p99_latency_ms", "value": 142.7 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "a3f9c12"
+    }
+  },
+  {
+    "timestamp": 1712016000,
+    "metrics": [
+      { "name": "throughput", "value": 5013.0 },
+      { "name": "p99_latency_ms", "value": 138.2 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "b7d2e45"
+    }
+  }
+]
+```
+
+---
+
+## Fields
+
+### `timestamp`
+
+- **Type:** integer (Unix epoch seconds)
+- **Required:** yes
+- Identifies when the benchmark run occurred. Used for time-range filtering 
via `DataSelector`.
+
+### `metrics`
+
+- **Type:** array of objects
+- **Required:** yes
+- Each object must have:
+  - `name` (string) — unique identifier for the metric within this run
+  - `value` (number) — the measured value
+- Metric names are collected dynamically across all entries in the file. Names 
must be consistent across runs for change-point analysis to be meaningful.
+
+### `attributes`
+
+- **Type:** object (string → string)
+- **Required:** yes if `branch` filtering is used
+- Arbitrary key-value pairs describing the run context (e.g. branch, commit, 
version).
+- The `branch` key is treated specially: if a branch is specified via 
`DataSelector` or `base_branch` in the config, only runs where 
`attributes["branch"]` matches that value are included.
+
+---
+
+## Configuration Example
+
+Add a test with `type: json` to your `otava.yaml`:
+```yaml
+tests:
+  my_benchmark:
+    type: json
+    file: otava/examples/json/data/sample.json
+    base_branch: main
+```
+
+| Field | Required | Description |
+|---|---|---|
+| `type` | yes | Must be `json` |
+| `file` | yes |  Path to the JSON file |
+| `base_branch` | no | If set, only runs from this branch are analyzed by 
default |
+
+---
+
+## Behavior
+
+- **File loading:** The file is read in full when first accessed. Parsed 
content is cached in memory for the lifetime of the session — repeated calls 
with the same file path do not re-read from disk.
+- **Metric discovery:** All metric names are collected by scanning every entry 
in the file. The resulting set is unordered.
+- **Attribute discovery:** Attribute keys are collected the same way — by 
scanning all entries.
+- **Branch filtering:** If `selector.branch` is set, only runs where 
`attributes["branch"]` equals that value are included. If not set but 
`base_branch` is configured, that value is used instead. If neither is set, all 
runs are included.
+- **Metric filtering:** If `selector.metrics` is set, only metrics whose names 
appear in that list are included. Others are silently skipped.
+- **Time filtering:** Entries outside `selector.since_time` / 
`selector.until_time` are excluded. An invalid range (since > until) raises an 
error.
+- **Truncation:** After filtering, only the last `selector.last_n_points` 
entries are kept for time, data, and attributes.
+
+---
+
+## Limitations
+
+- The entire file is read into memory at once. Very large files may cause high 
memory usage.
+- There is no schema validation. Missing or malformed fields will cause a 
`KeyError` at runtime.
+- The `branch` filter requires the key `"branch"` to exist inside `attributes` 
on every entry — if it is absent on any entry that would otherwise be included, 
the importer will raise a `KeyError`.
+- Attribute values are expected to be strings. No type coercion is performed.
+- The file path is resolved at config load time; a missing file raises a 
`TestConfigError` immediately.
+
+---
+
+## Example Usage
+
+Run analysis on a test backed by a JSON file:

Review Comment:
   Suggest to change:
   
   "Analyze test results stored in JSON format."



##########
docs/JSON.md:
##########
@@ -0,0 +1,136 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# JSON Data Source
+
+> **Tip**
+> See [examples/](../examples/) for sample configuration files.
+
+## Overview
+
+`JsonImporter` reads benchmark results from a local JSON file and feeds them 
into Otava for change-point analysis. It is a simple data source to set up — no 
external database or service is required.
+
+The importer caches parsed file content in memory, so a file is only read once 
per session even if multiple tests reference the same path.
+
+---
+
+## Expected JSON Format
+
+The input file must be a JSON array. Each element represents a single 
benchmark run.
+```json
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 4821.0 },
+      { "name": "p99_latency_ms", "value": 142.7 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "a3f9c12"
+    }
+  },
+  {
+    "timestamp": 1712016000,
+    "metrics": [
+      { "name": "throughput", "value": 5013.0 },
+      { "name": "p99_latency_ms", "value": 138.2 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "b7d2e45"
+    }
+  }
+]
+```
+
+---
+
+## Fields
+
+### `timestamp`
+
+- **Type:** integer (Unix epoch seconds)
+- **Required:** yes
+- Identifies when the benchmark run occurred. Used for time-range filtering 
via `DataSelector`.
+
+### `metrics`
+
+- **Type:** array of objects
+- **Required:** yes
+- Each object must have:
+  - `name` (string) — unique identifier for the metric within this run
+  - `value` (number) — the measured value
+- Metric names are collected dynamically across all entries in the file. Names 
must be consistent across runs for change-point analysis to be meaningful.
+
+### `attributes`
+
+- **Type:** object (string → string)
+- **Required:** yes if `branch` filtering is used

Review Comment:
   => So really you are saying Required: no



##########
otava/examples/json/config/otava.yaml:
##########
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+tests:
+  my_benchmark:
+    type: json
+    file: test_data/sample.json

Review Comment:
   This needs updating. The user should be able to try the provided example and 
it should work.



##########
otava/examples/json/data/sample.json:
##########
@@ -0,0 +1,11 @@
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 100 }
+    ],
+    "attributes": {
+      "branch": "main"
+    }
+  }

Review Comment:
   You'd better add more than one data point if  you want to demo change point 
detection :-)



##########
docs/JSON.md:
##########
@@ -0,0 +1,136 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# JSON Data Source
+
+> **Tip**
+> See [examples/](../examples/) for sample configuration files.
+
+## Overview
+
+`JsonImporter` reads benchmark results from a local JSON file and feeds them 
into Otava for change-point analysis. It is a simple data source to set up — no 
external database or service is required.
+
+The importer caches parsed file content in memory, so a file is only read once 
per session even if multiple tests reference the same path.
+
+---
+
+## Expected JSON Format
+
+The input file must be a JSON array. Each element represents a single 
benchmark run.
+```json
+[
+  {
+    "timestamp": 1711929600,
+    "metrics": [
+      { "name": "throughput", "value": 4821.0 },
+      { "name": "p99_latency_ms", "value": 142.7 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "a3f9c12"
+    }
+  },
+  {
+    "timestamp": 1712016000,
+    "metrics": [
+      { "name": "throughput", "value": 5013.0 },
+      { "name": "p99_latency_ms", "value": 138.2 }
+    ],
+    "attributes": {
+      "branch": "main",
+      "commit": "b7d2e45"
+    }
+  }
+]
+```
+
+---
+
+## Fields
+
+### `timestamp`
+
+- **Type:** integer (Unix epoch seconds)
+- **Required:** yes
+- Identifies when the benchmark run occurred. Used for time-range filtering 
via `DataSelector`.
+
+### `metrics`
+
+- **Type:** array of objects
+- **Required:** yes
+- Each object must have:
+  - `name` (string) — unique identifier for the metric within this run
+  - `value` (number) — the measured value
+- Metric names are collected dynamically across all entries in the file. Names 
must be consistent across runs for change-point analysis to be meaningful.
+
+### `attributes`
+
+- **Type:** object (string → string)
+- **Required:** yes if `branch` filtering is used
+- Arbitrary key-value pairs describing the run context (e.g. branch, commit, 
version).
+- The `branch` key is treated specially: if a branch is specified via 
`DataSelector` or `base_branch` in the config, only runs where 
`attributes["branch"]` matches that value are included.
+
+---
+
+## Configuration Example
+
+Add a test with `type: json` to your `otava.yaml`:
+```yaml
+tests:
+  my_benchmark:
+    type: json
+    file: otava/examples/json/data/sample.json
+    base_branch: main
+```
+
+| Field | Required | Description |
+|---|---|---|
+| `type` | yes | Must be `json` |
+| `file` | yes |  Path to the JSON file |
+| `base_branch` | no | If set, only runs from this branch are analyzed by 
default |

Review Comment:
   Change "runs" to "results"
   
   Both are correct, but "results" it what we generally talk about. (aka test 
results, ...)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add documentation for JsonImporter [otava]

Reply via email to