GitHub user tenthe created a discussion: Connect Library Refactoring:
Introducing Function Transformation Step For Adapter Creation
I'd like to propose a significant refactoring and enhancement of the
StreamPipes Connect module.
The goal is to **simplify the application logic, improve the user experience,
and vastly increase the flexibility** when connecting to new data sources,
especially those with complex data structures.
I've already started a prototype implementation in PR #4033, which helped
validate the approach and confirmed the potential for a much simpler connection
creation process.
### ❓ The Problem with Current Capabilities
The current Connect module faces three key limitations that complicate its use
for end-users:
1. **Complex JSON Structures:** Applying necessary data transformations (e.g.,
extracting values from nested properties) to complex or deeply nested JSON
structures is currently tedious and overly complicated, relying on a rigid set
of pre-defined transformation rules.
2. **Rigid Adapter Schema:** Connecting new, unique data sources often
requires the implementation of a new, custom adapter just to handle a specific
parsing or schema mapping logic.
3. **Reconstructing Original Event During Editing:** Since we currently do not
store the event schema before and after rule transformations, complex
application logic is required to infer the schema when a user edits an adapter.
### ✨ Proposed Solution: Function Transformation Step
I propose introducing a **Function Transformation Step** as the primary
mechanism for preprocessing incoming events within the Connect module.
The core idea is to allow users to define a script (e.g., **JavaScript**) that
takes the original event (a map) and returns the transformed event (a new map).
```javascript
// Example of the function signature
function transformEvent(event) {
// Transformation logic here...
return event;
}
```
This change will fully resolve the limitations above by providing a flexible
mechanism for defining complex parsing and schema mapping logic directly.
#### Key Advantages:
* **Increased Flexibility:** A scripting language allows for arbitrary logic,
such as extracting values from complex JSON paths, performing conditional
logic, or applying custom mathematical transformations.
* **Simplified Connection Creation:** The adapter's sole responsibility is
reading the source data and providing it as a raw `Map<String, Object>`. The
complex transformation and schema inference logic is centralized in the
Function step and the backend.
* **Support for Diverse Data Sources:** We can provide sample scripts to
support multiple data sources without needing to implement a new Java adapter
for each.
* **Enhanced Editability (Sample Event Storage):** The adapter will store a
sample event, which allows users to edit adapters later on (especially for
brokers) without requiring a live event to be received when clicking "edit."
### 🔄 Impact on Transformation Rules
This new approach would replace most of the existing rigid
`TransformationRuleDescription` classes with flexible scripting logic.
Here is a table showing how current rules would be mapped to the new
function-based approach:
| Rule Type | Current TransformationRuleDescription | New Function Script
Equivalent (JavaScript) |
| :--- | :--- | :--- |
| **Schema** | `DeleteRuleDescription` | `delete event.propertyName` |
| **Schema** | `MoveRuleDescription` | `event.newValue =
event.oldNested.oldValue` |
| **Schema** | `RenameRuleDescription` | `event.newName = event.oldName` |
| **Value** | `AddTimestampRuleDescription` | `event.time = Date.now()` |
| **Value** | `AddValueTransformationRuleDescription` | `event.newValue = 0.1` |
| **Value** | `CorrectionValueTransformationRuleDescription` | `event.value =
event.value * 2` |
| **Value** | `RegexTransformationRuleDescription` | `event.value =
event.value.replace(/sensor/i, "")` |
| **Value** | `TimestampTranfsformationRuleDescription` | `event.time = new
Date(event.time).getTime()` |
| **Stream** | `EventRateTransformationRuleDescription` | **No Change**
(Applied on event stream) |
| **Stream** | `RemoveDuplicatesTransformationRuleDescription` | **No Change**
(Applied on event stream) |
| **Value** | `ChangeDatatypeTransformationRuleDescription` | **No Change** |
| **Value** | `UnitTransformRuleDescription` | **No Change** (calculation
factors required) |
The few rules that won't change are those that operate *after* the schema has
been inferred or require specialized calculations/metadata (like Unit
Transformation).
### 🎨 Updated User Flow
The new user flow for adapter configuration:
1. **Settings (No Change):** Basic adapter setup (e.g., broker credentials).
2. **Configure Schema (New View):** This is where the function transformation
happens.
* **Input:** Shows a sample raw event (Original/Parsed).
* **Script Editor:** Provides a code editor for the user to write/edit
the `transformEvent` script.
* **Output:** Shows the result of applying the script to the sample event.
4. **Configure Fields (Simplified View):** The backend infers the event schema
(runtime names and data types) from the resulting event of step 2.
* **Configuration:** Users select the timestamp field, configure unit
transformations, or apply data type changes.
5. **Start Adapter (No Change):** Final review and deployment.
<img width="2255" height="877" alt="grafik"
src="https://github.com/user-attachments/assets/1e756cbc-33f1-4d3c-aea9-82466b086b9a"
/>
### ⚙️ Technical and Migration Details
This change introduces significant modifications to the UI, the data model, and
the backend implementation.
#### Backend REST Endpoints:
We will introduce new REST endpoints to support the live-editing and preview
capabilities in the UI:
* `/sample`: Connects to the extension service to receive a sample event from
the data source.
* `/sample/transform`: Takes the sample event and the script, returning the
new transformed event structure for UI preview.
* `/schema`: Takes the resulting event (from the transformation) and returns
the inferred `EventSchema` object.
* `/schema/preview`: Takes the transformed event and applies the post-script
value transformations (like Unit/Datatype changes) for a final preview.
#### Data Model Migration:
The most crucial requirement is that **all changes must be automatically
transformed** so that existing, old adapters continue to function without any
manual change required by the user.
* The existing `AdapterDescription` contains a list of
`TransformationRuleDescription`.
* During the migration, we must implement logic to automatically convert
these legacy rule descriptions into an equivalent Function Script, which will
then be stored in the updated data model.
### Feedback
I believe these changes will greatly increase user flexibility and still remain
straightforward for users to apply. Do you have any thoughts on the changes?
GitHub link: https://github.com/apache/streampipes/discussions/4048
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]