GitHub user tenthe created a discussion: Connect Library Refactoring: 
Introducing Function Transformation Step For Adapter Creation

I'd like to propose a significant refactoring and enhancement of the 
StreamPipes Connect module. 
The goal is to **simplify the application logic, improve the user experience, 
and vastly increase the flexibility** when connecting to new data sources, 
especially those with complex data structures.

I've already started a prototype implementation in PR #4033, which helped 
validate the approach and confirmed the potential for a much simpler connection 
creation process.

### ❓ The Problem with Current Capabilities

The current Connect module faces three key limitations that complicate its use 
for end-users:

1.  **Complex JSON Structures:** Applying necessary data transformations (e.g., 
extracting values from nested properties) to complex or deeply nested JSON 
structures is currently tedious and overly complicated, relying on a rigid set 
of pre-defined transformation rules.
2.  **Rigid Adapter Schema:** Connecting new, unique data sources often 
requires the implementation of a new, custom adapter just to handle a specific 
parsing or schema mapping logic.
3. **Reconstructing Original Event During Editing:** Since we currently do not 
store the event schema before and after rule transformations, complex 
application logic is required to infer the schema when a user edits an adapter.

### ✨ Proposed Solution: Function Transformation Step

I propose introducing a **Function Transformation Step** as the primary 
mechanism for preprocessing incoming events within the Connect module.

The core idea is to allow users to define a script (e.g., **JavaScript**) that 
takes the original event (a map) and returns the transformed event (a new map).

```javascript
// Example of the function signature
function transformEvent(event) {
  // Transformation logic here...
  return event;
}
```

This change will fully resolve the limitations above by providing a flexible 
mechanism for defining complex parsing and schema mapping logic directly.

#### Key Advantages:

  * **Increased Flexibility:** A scripting language allows for arbitrary logic, 
such as extracting values from complex JSON paths, performing conditional 
logic, or applying custom mathematical transformations.
  * **Simplified Connection Creation:** The adapter's sole responsibility is 
reading the source data and providing it as a raw `Map<String, Object>`. The 
complex transformation and schema inference logic is centralized in the 
Function step and the backend.
  * **Support for Diverse Data Sources:** We can provide sample scripts to 
support multiple data sources without needing to implement a new Java adapter 
for each.
  * **Enhanced Editability (Sample Event Storage):** The adapter will store a 
sample event, which allows users to edit adapters later on (especially for 
brokers) without requiring a live event to be received when clicking "edit."

### 🔄 Impact on Transformation Rules

This new approach would replace most of the existing rigid 
`TransformationRuleDescription` classes with flexible scripting logic.

Here is a table showing how current rules would be mapped to the new 
function-based approach:

| Rule Type | Current TransformationRuleDescription | New Function Script 
Equivalent (JavaScript) |
| :--- | :--- | :--- |
| **Schema** | `DeleteRuleDescription` | `delete event.propertyName` |
| **Schema** | `MoveRuleDescription` | `event.newValue = 
event.oldNested.oldValue` |
| **Schema** | `RenameRuleDescription` | `event.newName = event.oldName` |
| **Value** | `AddTimestampRuleDescription` | `event.time = Date.now()` |
| **Value** | `AddValueTransformationRuleDescription` | `event.newValue = 0.1` |
| **Value** | `CorrectionValueTransformationRuleDescription` | `event.value = 
event.value * 2` |
| **Value** | `RegexTransformationRuleDescription` | `event.value = 
event.value.replace(/sensor/i, "")` |
| **Value** | `TimestampTranfsformationRuleDescription` | `event.time = new 
Date(event.time).getTime()` |
| **Stream** | `EventRateTransformationRuleDescription` | **No Change** 
(Applied on event stream) |
| **Stream** | `RemoveDuplicatesTransformationRuleDescription` | **No Change** 
(Applied on event stream) |
| **Value** | `ChangeDatatypeTransformationRuleDescription` | **No Change** |
| **Value** | `UnitTransformRuleDescription` | **No Change** (calculation 
factors required) |

The few rules that won't change are those that operate *after* the schema has 
been inferred or require specialized calculations/metadata (like Unit 
Transformation).

### 🎨 Updated User Flow

The new user flow for adapter configuration:

1.  **Settings (No Change):** Basic adapter setup (e.g., broker credentials).
2.  **Configure Schema (New View):** This is where the function transformation 
happens.
      * **Input:** Shows a sample raw event (Original/Parsed).
      * **Script Editor:** Provides a code editor for the user to write/edit 
the `transformEvent` script.
      * **Output:** Shows the result of applying the script to the sample event.
4.  **Configure Fields (Simplified View):** The backend infers the event schema 
(runtime names and data types) from the resulting event of step 2.
      * **Configuration:** Users select the timestamp field, configure unit 
transformations, or apply data type changes.
5.  **Start Adapter (No Change):** Final review and deployment.

<img width="2255" height="877" alt="grafik" 
src="https://github.com/user-attachments/assets/1e756cbc-33f1-4d3c-aea9-82466b086b9a";
 />


### ⚙️ Technical and Migration Details

This change introduces significant modifications to the UI, the data model, and 
the backend implementation.

#### Backend REST Endpoints:

We will introduce new REST endpoints to support the live-editing and preview 
capabilities in the UI:

  * `/sample`: Connects to the extension service to receive a sample event from 
the data source.
  * `/sample/transform`: Takes the sample event and the script, returning the 
new transformed event structure for UI preview.
  * `/schema`: Takes the resulting event (from the transformation) and returns 
the inferred `EventSchema` object.
  * `/schema/preview`: Takes the transformed event and applies the post-script 
value transformations (like Unit/Datatype changes) for a final preview.

#### Data Model Migration:

The most crucial requirement is that **all changes must be automatically 
transformed** so that existing, old adapters continue to function without any 
manual change required by the user.
  * The existing `AdapterDescription` contains a list of 
`TransformationRuleDescription`.
  * During the migration, we must implement logic to automatically convert 
these legacy rule descriptions into an equivalent Function Script, which will 
then be stored in the updated data model.

### Feedback
I believe these changes will greatly increase user flexibility and still remain 
straightforward for users to apply. Do you have any thoughts on the changes?


GitHub link: https://github.com/apache/streampipes/discussions/4048

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to