Re: [DISCUSS] ACBP × HL7: Calcite IR → multi-dialect SQL for deterministic policy artifacts (Why ACBP, benefits, MWE, questions)

Francis Chuang Tue, 19 Aug 2025 16:37:47 -0700

Hi Muteb,

Your messages are currently going into the moderation queue becauseyou're not subscribed to the list. Can you please subscribe per theseinstructions (under the Help heading):https://calcite.apache.org/community/#mailing-lists

The subscription process is automated, so you should receive an email toconfirm your subscription immediately after sending an email to thesubscription address.


Francis

On 20/08/2025 9:31 am, Muteb Alanazi wrote:

Subject: Re: ACBP × HL7: Calcite IR -> multi-dialect SQL (tests & scope)

Hi Julian, Calcite devs,

Thanks for the guidance—tests are the right shape. Quick clarifications and a 
few test cases inline.

ACBP model & translator (clarifications)
- ACBP has a configuration/model (JSON/YAML) that defines categories, derived 
boolean flags, and one deterministic CASE for the action.
- There is no separate runtime query language. The compiler/translator is a pure 
function: model x options x dialect -> SQL artifacts (primarily SELECT/CASE; 
DDL templated per dialect).
- Stateless: given the same inputs, it emits identical SQL.
- For this thread, we’ll focus on the SELECT core; CTAS/REPLACE is templated 
outside Calcite.
- Proposed translator API (pseudocode):

   String sql = acbpToSql(
     /* modelJson */ String,
     /* entry point, e.g., "decision_space_hl7" */ String,
     /* dialect */ SqlDialect,
     /* options, e.g., time window */ Map<String,Object>
   );

Quick terms: HL7 vs ACBP (how they relate)
- HL7 (context/data): HL7 v2 messages are canonicalized into regular tables 
(e.g., hl7_messages) with columns such as message_type (ADT/ORM/ORU), 
trigger_event (A01/A04/A03/...), patient_class (E/I/O), order_priority 
(STAT/ROUTINE), obs_abn_flag (H/L/N), loinc_code, event_ts, etc. In this 
thread, HL7 is just the domain schema we query over.
- ACBP (compiler/semantics): A policy compiler that, given the model JSON, 
defines flags (predicates over those columns) and a deterministic CASE that 
yields an action_id, then emits dialect-specific SQL. It does not parse HL7; it 
consumes those columns.

Mapping to your suggested harness
- Your "config" = ACBP model JSON (categories/flags/decision).
- A separate "query" string isn’t needed; we pass an entry point name (e.g., 
"decision_space_hl7").
- Dialect = target SqlDialect. Result = deterministic SQL string.

Below are three test cases (simple -> moderate). The JSON is trimmed to the 
essentials (categories/flags/rules). Expected SQL strings are the requirement.

Test 1 — Simple ADT (Admission/Discharge/Transfer) rule (PostgreSQL)
Model (JSON)
{
   "model": "hl7_v1",
   "from": "hl7_messages",
   "time_column": "event_ts",
   "categories": {
     "message_type": ["ADT","ORM","ORU"],
     "trigger_event": ["A01","A04","A03","O01","R01"],
     "patient_class": ["E","I","O"]
   },
   "flags": {
     "is_admission": "message_type = 'ADT' AND trigger_event IN ('A01','A04')",
     "is_emergency": "patient_class = 'E'"
   },
   "decision": [
     {"when": "is_admission AND is_emergency", "then": 2},
     {"default": 1}
   ]
}

Invocation
var sql = acbpToSql(modelJson, "decision_space_hl7",
     PostgresqlSqlDialect.DEFAULT,
     Map.of("windowDays", 2));

Expect
SELECT
   msg_id,
   event_ts,
   CASE
     WHEN (message_type = 'ADT' AND trigger_event IN ('A01','A04') AND 
patient_class = 'E') THEN 2
     ELSE 1
   END AS action_id
FROM hl7_messages
WHERE event_ts >= now() - interval '2 days'

Test 2 — ORU abnormal + critical LOINC (BigQuery)
Model (JSON)
{
   "model": "hl7_v1",
   "from": "hl7_messages",
   "time_column": "event_ts",
   "categories": {
     "message_type": ["ADT","ORM","ORU"],
     "obs_abn_flag": ["H","L","N"]
   },
   "refs": {
     "critical_loinc_table": "ref_critical_loinc"
   },
   "flags": {
     "abnormal_result": "message_type = 'ORU' AND obs_abn_flag IN ('H','L')",
     "critical_analyte": "loinc_code IN (SELECT code FROM ref_critical_loinc)"
   },
   "decision": [
     {"when": "abnormal_result AND critical_analyte", "then": 3},
     {"when": "abnormal_result", "then": 2},
     {"default": 1}
   ]
}

Invocation
var sql = acbpToSql(modelJson, "decision_space_hl7",
     BigQuerySqlDialect.DEFAULT,
     Map.of("windowDays", 2));

Expect
SELECT
   msg_id,
   event_ts,
   CASE
     WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L') AND loinc_code IN 
(SELECT code FROM ref_critical_loinc)) THEN 3
     WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L')) THEN 2
     ELSE 1
   END AS action_id
FROM hl7_messages
WHERE event_ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)

Test 3 — ORM STAT + abnormal (ClickHouse)
Model (JSON)
{
   "model": "hl7_v1",
   "from": "hl7_messages",
   "time_column": "event_ts",
   "categories": {
     "message_type": ["ADT","ORM","ORU"],
     "order_priority": ["STAT","ROUTINE"],
     "obs_abn_flag": ["H","L","N"]
   },
   "flags": {
     "is_stat_order": "message_type = 'ORM' AND order_priority = 'STAT'",
     "abnormal_result": "obs_abn_flag IN ('H','L')"
   },
   "decision": [
     {"when": "is_stat_order AND abnormal_result", "then": 3},
     {"when": "is_stat_order", "then": 2},
     {"default": 1}
   ]
}

Invocation
var sql = acbpToSql(modelJson, "decision_space_hl7",
     ClickHouseSqlDialect.DEFAULT,
     Map.of("windowDays", 2));

Expect
SELECT
   msg_id,
   event_ts,
   CASE
     WHEN (message_type = 'ORM' AND order_priority = 'STAT' AND obs_abn_flag IN 
('H','L')) THEN 3
     WHEN (message_type = 'ORM' AND order_priority = 'STAT') THEN 2
     ELSE 1
   END AS action_id
FROM hl7_messages
WHERE event_ts >= now() - INTERVAL 2 DAY

Scope preference
For now, I’m leaning toward an external project that uses Calcite services 
(RelBuilder, RelToSqlConverter, SqlDialect) to generate the SELECT, with DDL 
templated per dialect. If we find re-usable gaps/idioms, I’m happy to 
contribute small PRs + golden tests in org.apache.calcite.sql.dialect.*. If the 
community later prefers a Calcite module, we can discuss the fit.

If these cases look right, I can provide a tiny repo with:
- RelBuilder construction of the HL7 plan,
- golden expected SQL for Postgres/BigQuery/ClickHouse,
- and a failing test if we hit any dialect edge (to drive a Jira).

Thanks again,
Muteb (DotK)

Note: I’m awaiting subscription confirmation for [email protected]; please CC 
me on replies.
________________________________
From: Julian Hyde <[email protected]>
Sent: Wednesday, August 20, 2025 1:58 AM
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] ACBP × HL7: Calcite IR → multi-dialect SQL for 
deterministic policy artifacts (Why ACBP, benefits, MWE, questions)

Thanks for reaching out, Muteb.

I’m not familiar with HL7 (though a google search reveals that is a standard in 
the healthcare) and ACBP is new (though I’m familiar with business 
rules/constraint languages), so forgive me if I’m a little slow to comprehend 
this.

I would like to figure out whether an integration between ACBP and Calcite 
would be a module within Calcite or an external project that uses various 
Calcite services (such as RelBuilder and the dialects).

Could the requirement be written in the form of a test case, something like 
this?

   String config = “…”; // a string in ABCP’s configuration language
   String query = “…”; // a string in the ABCP’s query language
   Dialect = …; // target dialect
   String sql = abcpToSql(config, query, dialect);
   assertThat(sql, is(“…”));

I have assumed that ABCP has a configuration language (akin to SQL DML, 
defining rules) and a runtime language (akin to SQL queries and DML), and that 
the translator is stateless. Let me know if these assumptions are valid.

If the formulation as test cases is appropriate, I would like to see few test 
cases, ranging from simple to moderately hard to implement.

Julian

PS If you are not subscribed yet, please subscribe ASAP. Until then, you can 
see replies in the email archive, 
https://lists.apache.org/thread/fcpko5d8orq5m3los0n65xhg3p1vn2rq.



On Aug 19, 2025, at 3:27 PM, Muteb Alanazi <[email protected]> wrote:

Hi Calcite devs,

I’m exploring Apache Calcite as the SQL generator for ACBP, a policy compiler 
that turns categorical/boolean models into deterministic, auditable SQL 
artifacts (views/CTAS) with formal checks (soundness/coverage/dedup).

Public overview (no private IP):
- Overview: https://dotkboy-web.github.io/acbp/
- Theorems: https://dotkboy-web.github.io/acbp/acbp_theorems.html
- ACBP Equation: https://dotkboy-web.github.io/acbp/ACBP-Equation.html
- DOI snapshot: https://zenodo.org/records/16891549

Why ACBP? (relevance to Calcite)
- Realistic workload: CASE-heavy categorical logic over clean tables, no 
vendor-specific UDFs → ideal for multi-dialect unparse coverage.
- Deterministic semantics: decisions must be identical across dialects → strong 
target for correctness/golden tests of RelToSql/dialects.
- Governance-first: compiler proves properties (soundness/coverage/dedup) and 
keeps decisions auditable; portability matters for users running multiple 
engines.
- Clean separation: ACBP owns the IR (categories/flags/rules); Calcite prints 
dialect SQL. Any gaps can become small PRs in sql.dialect.
- Give back: happy to contribute a tiny HL7 suite of golden tests (input plan → 
expected strings) and open JIRAs for dialect gaps.

Why this helps HL7/HIS (benefits & what’s novel)
- Operational: deterministic routing and noise control (silencing windows, 
dedup, escalation ladders) encoded as SQL CASE/joins.
- Governance/audit: soundness/coverage/dedup checks run in-DB; every action has 
explainable predicates + policy version for replay.
- Engineering: one IR, many dialects (PostgreSQL/BigQuery/ClickHouse/Spark) via 
Calcite; DDL templated per dialect.
- KPIs: fewer duplicate/low-value alerts, faster policy iteration with 
guardrails, and stable sub-second P95 on “what to do now” boards.
- Novelty: not “rules in healthcare” per se―the combination of a SQL-native 
policy compiler + formal coverage/soundness/dedup + decision-space enumeration 
+ multi-dialect generation is distinct from interface/rules engines.

Goal
Given an HL7-derived model (categories + flags + deterministic rules), build a 
Calcite plan once and unparse to multiple SQL dialects to produce “decision 
space” artifacts, preserving ACBP semantics.

What “ACBP semantics” means here
- Categories (finite enums) + derived boolean flags drive a single 
deterministic CASE that yields an action (throttle/alert/escalate/...).
- The compiler guarantees governance properties and keeps decisions auditable 
in-DB.
- SELECT/CASE is the portable core; DDL (CTAS/REPLACE) is wrapped by small 
templates per dialect.

HL7 → categories/flags (representative mapping)
- ADT: message_type='ADT', trigger_event ∈ {A01,A04,A08,A03}, patient_class ∈ 
{E,I,O}
  Flags: is_admission := trigger_event IN ('A01','A04'); is_discharge := 
trigger_event='A03'; is_emergency := patient_class='E'.
- ORM: message_type='ORM', order_priority ∈ {STAT,ROUTINE}, order_type 
(LAB/RX/PROC)
  Flags: is_stat_order := order_priority='STAT'; is_critical_order := 
order_type IN ('LAB','PROC').
- ORU: message_type='ORU', obs_abn_flag (OBX-8: H/L/N), loinc_code
  Flags: abnormal_result := obs_abn_flag IN ('H','L'); critical_analyte := 
loinc_code IN ref_critical_loinc.

MWE (shape representative of ACBP output)
-- Schema (selected fields; staging joins handled upstream)
CREATE TABLE hl7_messages (
  msg_id BIGINT,
  event_ts TIMESTAMP,
  message_type VARCHAR,     -- ADT, ORM, ORU
  trigger_event VARCHAR,    -- A01, O01, R01, ...
  patient_class VARCHAR,    -- PV1-2: E/I/O
  order_priority VARCHAR,   -- STAT, ROUTINE
  loinc_code VARCHAR,
  obs_abn_flag VARCHAR      -- OBX-8: H/L/N
);

-- Deterministic policy (CASE over categories/flags):
SELECT
  msg_id,
  event_ts,
  CASE
    WHEN (message_type = 'ORM' AND order_priority = 'STAT'
          AND obs_abn_flag IN ('H','L')) THEN 3                    -- escalate
    WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L')
          AND loinc_code IN (SELECT code FROM ref_critical_loinc)) THEN 3
    WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L')) THEN 2 -- alert
    WHEN (message_type = 'ADT' AND trigger_event IN ('A01','A04')
          AND patient_class = 'E') THEN 2
    ELSE 1                                                          -- throttle
  END AS action_id
FROM hl7_messages
WHERE event_ts >= CURRENT_TIMESTAMP - INTERVAL '2' DAY;

Smallest viable integration surface
1) Build relational algebra with RelBuilder (scan/filter/project/CASE; no 
non-portable ops).
2) Convert RelNode → SqlNode via RelToSqlConverter.
3) Unparse with SqlDialect + SqlPrettyWriter per target dialect.
4) Keep DDL templated per dialect (CTAS/OR REPLACE varies); the SELECT is the 
portable core.

Notes on portability
- I can express the time filter as DATEADD/TIMESTAMP_SUB if a dialect prefers 
that form.
- I avoid bitwise logic in the core SELECT; boolean flags are explicit 
predicates.
- Packed masks (if needed for analytics/storage) can be materialized as a 
follow-up step per dialect.

Questions
1) Is RelBuilder → RelToSqlConverter → SqlDialect the recommended path for 
multi-dialect generation today? Any caveats?
2) For interval arithmetic, would you represent a generic DATEADD/TIMESTAMP_SUB 
in the IR and let SqlDialect map it?
3) Guidance for ensuring CASE-heavy categorical logic unparses idiomatically 
across Postgres/BigQuery/ClickHouse/Spark?
4) If we hit a dialect gap, what’s the smallest PR you prefer (tests + changes 
in org.apache.calcite.sql.dialect.*)?

I can provide a tiny repo with:
- input plan construction (RelBuilder),
- expected SQL strings per dialect (golden tests),
- and JIRAs for any concrete gaps we uncover.

Thanks,
Muteb (DotK)
Asia/Riyadh

Note: I’m awaiting subscription confirmation; please keep me CC’d on replies.

Re: [DISCUSS] ACBP × HL7: Calcite IR → multi-dialect SQL for deterministic policy artifacts (Why ACBP, benefits, MWE, questions)

Reply via email to