GitHub user tenthe added a comment to the discussion: Moving to Flat Event
Structures in StreamPipes
Hi all,
thanks for the constructive feedback. I think there are two topics that we
should clarify here.
- **Topic 1: Flattening Nested Event Structures**
- **Topic 2: Handling Arrays in Nested Structures**
I think **Topic 1** is basically quite clear and we only have to make a few
minor API decisions. I'm still quite unsure about **Topic 2** as there are many
things that need to be taken into account.
### Topic 1: Flattening Nested Event Structures
The primary function of the flattening feature would be to convert nested event
structures into a flat format, simplifying downstream processing. When a user
selects this feature, the event data would be automatically flattened within
the adapter before further processing. Here's a simple example of how the
flattening process might work:
#### Example:
Given a nested event structure like this:
```json
{
"sensor_id": "sensor_1",
"timestamp": "2024-08-20T12:34:56Z",
"reading": {
"temperature": 25.3,
"humidity": 60.2
},
"location": {
"latitude": 40.7128,
"longitude": -74.0060
}
}
```
The flattening process would transform this into:
```json
{
"sensor_id": "sensor_1",
"timestamp": "2024-08-20T12:34:56Z",
"reading_temperature": 25.3,
"reading_humidity": 60.2,
"location_latitude": 40.7128,
"location_longitude": -74.0060
}
```
This flattened structure removes the complexity of nested objects. This would
be the new event schema and the events would all be transformed in the same way
at runtime.
#### Open Question:
- **T1.Q1:** Which delemiter should we choose?
- Users might already have delimiters in their field names, which could
lead to conflicts or confusion.
- Should we allow users to configure their preferred delimiter, or is
there another approach to avoid potential naming conflicts?
- **T1.Q2:** Is there anything else we must consider? (e.g. a case that cannot
simply be covered?)
- **T1.Q3:** Can we remove the old `MoveRule` with this new function?
### Topic 2: Handling Arrays in Nested Structures
Arrays within nested structures pose a unique challenge, especially when they
contain objects, and we don't know how many elements will be present. One
approach could be to enumerate through the array and flatten each object with
an index appended to the property names. However, this raises the question of
whether such a transformation always makes sense, especially in cases where the
array length varies or contains nested lists themselves.
#### Example:
Consider an event with an array of objects:
```json
{
"sensor_id": "sensor_1",
"measurements": [
{"type": "temperature", "value": 25.3},
{"type": "humidity", "value": 60.2}
]
}
```
Flattening this could result in:
```json
{
"sensor_id": "sensor_1",
"measurements_0_type": "temperature",
"measurements_0_value": 25.3,
"measurements_1_type": "humidity",
"measurements_1_value": 60.2
}
```
This approach preserves the data, but can lead to different structures
depending on the number of array elements. We cannot cover this behavior in
StreamPipes, especially when the data is stored. We always expect a fixed event
schema.
#### Open Question:
- **T2.Q1:** Should we support objects within arrays, or only support arrays
with primitive data types?
- **T2.Q2:** How could we handle arrays with objects?
- **T2.Q3:** Can we assume that arrays will always have the same number of
elements and that their order will not change? This assumption might rarely
hold true.
---
Looking forward to further feedback.
GitHub link:
https://github.com/apache/streampipes/discussions/3150#discussioncomment-10393757
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]