GitHub user tenthe added a comment to the discussion: Moving to Flat Event 
Structures in StreamPipes

Hi all,

thanks for the constructive feedback. I think there are two topics that we 
should clarify here. 
- **Topic 1: Flattening Nested Event Structures**
- **Topic 2: Handling Arrays in Nested Structures**

I think **Topic 1** is basically quite clear and we only have to make a few 
minor API decisions. I'm still quite unsure about **Topic 2** as there are many 
things that need to be taken into account.

### Topic 1: Flattening Nested Event Structures

The primary function of the flattening feature would be to convert nested event 
structures into a flat format, simplifying downstream processing. When a user 
selects this feature, the event data would be automatically flattened within 
the adapter before further processing. Here's a simple example of how the 
flattening process might work:

#### Example:

Given a nested event structure like this:

```json
{
  "sensor_id": "sensor_1",
  "timestamp": "2024-08-20T12:34:56Z",
  "reading": {
    "temperature": 25.3,
    "humidity": 60.2
  },
  "location": {
    "latitude": 40.7128,
    "longitude": -74.0060
  }
}
```

The flattening process would transform this into:

```json
{
  "sensor_id": "sensor_1",
  "timestamp": "2024-08-20T12:34:56Z",
  "reading_temperature": 25.3,
  "reading_humidity": 60.2,
  "location_latitude": 40.7128,
  "location_longitude": -74.0060
}
```

This flattened structure removes the complexity of nested objects.  This would 
be the new event schema and the events would all be transformed in the same way 
at runtime.

#### Open Question:

- **T1.Q1:** Which delemiter should we choose?
        - Users might already have delimiters in their field names, which could 
lead to conflicts or confusion.
        - Should we allow users to configure their preferred delimiter, or is 
there another approach to avoid potential naming conflicts?
- **T1.Q2:** Is there anything else we must consider? (e.g. a case that cannot 
simply be covered?)
- **T1.Q3:** Can we remove the old `MoveRule` with this new function?


### Topic 2: Handling Arrays in Nested Structures

Arrays within nested structures pose a unique challenge, especially when they 
contain objects, and we don't know how many elements will be present. One 
approach could be to enumerate through the array and flatten each object with 
an index appended to the property names. However, this raises the question of 
whether such a transformation always makes sense, especially in cases where the 
array length varies or contains nested lists themselves.

#### Example:

Consider an event with an array of objects:

```json
{
  "sensor_id": "sensor_1",
  "measurements": [
    {"type": "temperature", "value": 25.3},
    {"type": "humidity", "value": 60.2}
  ]
}
```

Flattening this could result in:

```json
{
  "sensor_id": "sensor_1",
  "measurements_0_type": "temperature",
  "measurements_0_value": 25.3,
  "measurements_1_type": "humidity",
  "measurements_1_value": 60.2
}
```

This approach preserves the data, but can lead to different structures 
depending on the number of array elements. We cannot cover this behavior in 
StreamPipes, especially when the data is stored. We always expect a fixed event 
schema.

#### Open Question:
- **T2.Q1:** Should we support objects within arrays, or only support arrays 
with primitive data types?
- **T2.Q2:** How could we handle arrays with objects?
- **T2.Q3:** Can we assume that arrays will always have the same number of 
elements and that their order will not change? This assumption might rarely 
hold true.

---

Looking forward to further feedback.

GitHub link: 
https://github.com/apache/streampipes/discussions/3150#discussioncomment-10393757

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to