Re: [D] Refactor: Decoupling Direct Database Connection From ComputingUnitMaster & ComputingUnitWorker [texera]

via GitHub Sat, 06 Jun 2026 11:59:00 -0700


GitHub user bobbai00 added a comment to the discussion: Refactor: Decoupling 
Direct Database Connection From ComputingUnitMaster & ComputingUnitWorker


Here is the layout of Physical Plan

# Physical Plan Spec

Layout:

```json
{
  "operators": [...list of physical operators...],
  "links": [...list of physical links...]
}
```

## Physical Operator Spec

```json
{
  "id": {
    "logicalOpId": { "id": "CSVScanSource-operator-id" },
    "layerName": "main"                                   // distinguishes 
physical stages from same logical op: main | partial | final
  },
  "workflowId": { "id": 0 },                              
  "executionId": { "id": 1 },                            

  "opExecInitInfo": {                                     // tells Amber how to 
construct the runtime executor
    // JVM operators use kind "className":
    "kind": "className",
    "className": 
"org.apache.texera.amber.operator.source.scan.csv.CSVScanSourceOpExec",
    "descString": "{...a JSON STRING that describes the property of the 
physical operator...}"
    // For scan sources (CSV/JSONL/Arrow/file), source path lives here as 
`fileName`.
       It looks like this: `dataset:///dataset-15/versionHash/raw/data.csv`  
(if the file is resolved on local file system, it will start with `file:///...`)
    // For UDF operators, the descStringuse kind "code" instead:
    //   { "kind": "code", "code": "class ProcessTupleOperator(...): ...", 
"language": "python" }
  },

  "parallelizable": true,
  "locationPreference": { "type": "roundRobin" },
  "partitionRequirement": [],                             // what each INPUT 
expects (array: one entry per input port)
    // null                                          -> no requirement for that 
input
    // { "type": "single" }                          -> gather into one 
partition
    // { "type": "hash", "hashAttributeNames": ["id"] } -> hash-partitioned by 
attributes
    // { "type": "broadcast" }                       -> broadcast to workers
    // { "type": "oneToOne" }                        -> partitioning maps 
one-to-one
    // { "type": "none" }                            -> no partitioning

  "partitionDeriveSpec": { "type": "passthrough" },       // what partitioning 
this operator PRODUCES
    // passthrough                                   -> preserve upstream 
partitioning
    // toSingle                                      -> produce a single 
partition
    // toHash + hashAttributeNames                   -> produce hash 
partitioning
    // toUnknown                                     -> partitioning unknown
    // projection                                    -> derive through 
projection

  "inputPortsSerialized": {},                             // map keyed 
"<portId>_<internalFlag>", e.g. "0_false"
  "outputPortsSerialized": {},                            // value = 2-item 
array: [portMetadata, schema|null]
    // portMetadata: { id:{id,internal}, displayName, blocking, mode }
    //   output `mode`: 0 = set snapshot | 1 = set delta | 2 = single snapshot
    // schema: { attributes: [ { attributeName, attributeType }, ... ] } or null
    //   attributeType: string | integer (32-bit) | long (64-bit) | double |
    //                  boolean | timestamp | binary | large_binary 
(pointer-like)

  "isOneToManyOp": false,
  "suggestedWorkerNum": 1,
  "pveName": ""
}
```

## Physical Link Spec

Each item in `links` connects one physical output port to one physical input 
port.

```json
{
  "fromOpId": {
    "logicalOpId": { "id": "source-op-id" },
    "layerName": "main"
  },
  "fromPortId": { "id": 0, "internal": false },
  "toOpId": {
    "logicalOpId": { "id": "target-op-id" },
    "layerName": "main"
  },
  "toPortId": { "id": 0, "internal": false }
}
```


@Yicong-Huang @aglinxinyuan @Xiao-zhen-Liu Is this interpretation accurate ? If 
so I don't think physical plan contains any sensitive information and we can 
safely expose it to the client.

GitHub link: 
https://github.com/apache/texera/discussions/5295#discussioncomment-17204773

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Refactor: Decoupling Direct Database Connection From ComputingUnitMaster & ComputingUnitWorker [texera]

Reply via email to