Re: [DISCUSS] Questions about SQL MEASURE as an abstraction boundary

Julian Hyde Tue, 12 May 2026 23:45:34 -0700

Having worked with a few BI systems — including the MDX and Looker — I 
concluded that measures are the key concept.  Dimensions, hierarchies, and 
metadata like format-string and default sort order are useful but are secondary.

Recall that the relational model is an “algebra” for data. It contains a shape 
(the relation) that all data sets must conform to, and a set of operations for 
building queries (thereby deriving data sets from data sets). Importantly, it 
is closed: the output of a query has the shape of a relation, and therefore we 
can have views (and views on views on views).

To add measures, one approach is to define a new model — a new shape for data 
and new query operations. This is what MDX did, with cubes. The problem is that 
the model is not closed (the output of a query does not look exactly like a 
cube), and the query operations are not sufficient to build every possible 
cube. You need a modeling/mapping language separate from the query language. 
Also, MDX doesn’t possess essential operations, such as UNION and DELETE, that 
would let you do other data activities like ETL and transactions.

Looker did something similar. Looker’s results look like tables, have a little 
extra metadata (such as format strings) but are sufficiently close to relations 
that they can be computed using a single SQL query, and can be used as input to 
ETL. But Looker’s query language is incomplete. It doesn’t allow you to create 
new measures, join in new tables, union results, or execute subqueries like IN 
or EXISTS.

So, my proposal in the “SQL with Measures” paper is to add SQL to the 
relational model. The basic data set is a relation that may be zero or more 
measures. The query language is SQL, plus some operators to define measures in 
queries and to execute calculations in a dimensional context. And it is closed 
— the input relations may have measures, and the output relations may have 
measures, and starting from a database (tables without measures) you can create 
views that have measures by writing queries and storing them as views. Just as 
in regular SQL, the views are not strictly necessary because you can use CTEs 
in your queries.

I don’t like the term "semantic layer” because I don’t know what “semantic” 
means and I don’t like the idea of an extra layer. If it is a layer separate 
and above the relational database, then it is a new query language, and if that 
language is not closed, then you can’t build views on views. Conversely, if the 
layer is just the ability to define and use measures in queries then it isn’t a 
layer at all. Once you’ve built your views, the resulting database is still a 
collection of relations (which may or may not have measures). You can build a 
bronze-silver-gold medallion architecture with layers of views, add another 
layer of views for access control, and keep on going for as many layers as you 
like. That’s the beauty of a closed modeling/query language.

> 1. Is a measure-bearing view still best understood as a relation, or as a 
> relation plus semantic calculation metadata?

It’s a relation-with-measures. The measures can be evaluated (just like the 
columns of a regular relational view). “Semantic calculation metadata” is too 
grandiose a term — like any view, it is defined by a query, and the query 
language includes can consume and produce measures, just as it can regular 
columns.

> 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic 
> metadata expanded before planning?

External to what? One day (hopefully soon) there will be databases where 
measures can be part of view definitions. Until then, Calcite can have views 
that contain measures, and those views can be expanded before a query is 
generated against a back-end database. Dealing with measures is just 
view-expansion, and multi-layer views can just be expanded one layer at a time.

(Yes, expanding measures and dimensional calculations correctly is hard. But 
it’s easier to do using RelNode-to-RelNode rewrite rules than any other 
approach.)

> 3. If `MEASURE` lives in SQL, how broad should its evaluation context become?

I explained in the paper the idea that the evaluation context of a measure is a 
predicate that determines which rows of its table go into the aggregate 
function(s). If I want to calculate year-over-year revenue for California I 
would evaluate the revenue measure twice, once with “WHERE YEAR(orderDate) = 
2026 AND state = ‘CA’”, and once with “WHERE YEAR(orderDate) = 2025 AND state = 
‘CA’”.

You mention filter-only-fields, which makes me suspect that you are reading 
Looker’s specification. Looker has a few features that break the closure 
principle, so we should not attempt to implement those. For example, an explore 
might apply different filters internally based on which fields of the explore 
were are being projected. Though those features are useful for people building 
fancy dashboards they have no place in a query language.

Some of Looker’s features could be created by adding attributes [ 
https://issues.apache.org/jira/browse/CALCITE-6425 ] and parameter columns 
(columns that have the same value for every row of the table, but can be 
overridden in an AT clause).

Julian

> On May 11, 2026, at 5:56 AM, Cancai Cai <[email protected]> wrote:
> 
> Hi Calcite community,
> 
> I recently asked a related question on [CALCITE-4496][1]. I would like to
> bring the discussion to the dev list because the question is not only about
> one JIRA item or implementation detail, but about the abstraction boundary
> of Measure in SQL itself.
> 
> Over the past quarter, I have been exploring semantic layer / semantic
> modeling DSL design. During this process, I looked at adjacent systems and
> ideas such as [Looker][3], [Cube][4], [Malloy][5], [dbt Semantic Layer /
> MetricFlow][6], and Calcite’s measure work. These systems all point to the
> same need: reusable business calculations and governed analytical models.
> But they draw the boundary differently. Some keep measures in an external
> semantic model; Calcite’s SQL measure work moves reusable calculations into
> SQL itself.
> 
> That boundary is the part I would like to discuss.
> 
> I read [CALCITE-4496][1] and Julian Hyde / John Fremlin’s paper [“Measures
> in SQL”][2]. The core motivation makes sense to me. Traditional SQL views
> cannot preserve reusable aggregate calculations very well. Once a view
> turns `AVG`, ratios, margins, or similar calculations into ordinary
> columns, downstream roll-ups can lose the original calculation semantics.
> The classic example is “average of averages”.
> 
> `MEASURE` tries to solve this by making a measure column not just a
> materialized value, but a calculation that can be re-evaluated in the outer
> query context. This is a powerful idea.
> 
> My question is about where this abstraction should live.
> 
> In traditional SQL, a view can be understood as a relation. It produces
> rows and columns; columns are values. Even if the view definition is
> complex, downstream queries can treat it as a black-box relation.
> 
> A measure-bearing view is different. Some columns are regular row values,
> while some columns are deferred aggregate calculations that can be
> re-evaluated in an outer query context. The view is no longer only a
> relation in the traditional value-level sense; it also carries reusable
> calculation semantics.
> 
> That leads to a few questions.
> 
> 1. Is a measure-bearing view still best understood as a relation, or as a
> relation plus semantic calculation metadata?
> 
> In semantic modeling systems, dimensions and measures are different kinds
> of objects. A dimension describes rows and can be selected, grouped,
> filtered, or displayed. A measure describes how to compute an aggregate
> over a set of rows, and its value depends on query context.
> 
> If SQL schema contains both regular columns and measure columns, users and
> planners need to understand that they are not the same kind of column. Is
> this the intended mental model for SQL tables with measures?
> 
> 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic
> metadata expanded before planning?
> 
> For semantic layers built on Calcite, there seem to be two possible
> directions:
> 
> ```text
> semantic model -> SQL with MEASURE -> Calcite expands / optimizes measures
> ```
> 
> or:
> 
> ```text
> semantic model -> ordinary SQL -> Calcite optimizes relational SQL
> ```
> 
> The first direction gives SQL a native reusable-calculation abstraction.
> The second keeps SQL closer to the traditional relational model, and leaves
> semantic complexity in the modeling layer.
> 
> I am trying to understand which direction better matches Calcite’s
> long-term goal.
> 
> 3. If `MEASURE` lives in SQL, how broad should its evaluation context
> become?
> 
> The paper defines measures using context-sensitive expressions, evaluation
> context, and the `AT` operator. That model is expressive.
> 
> In real semantic layer systems, query context often includes more than
> `GROUP BY`. It may include:
> 
> - filters before aggregation vs filters after aggregation;
> - joins that may change grain;
> - time dimensions, time grain, and timezone rules;
> - hidden dimensions or filter-only fields;
> - policy filters and query parameters;
> - derived views that preserve or reshape dimensionality.
> 
> Should SQL `MEASURE` stay focused on reusable aggregate calculations and
> their evaluation context, while these broader semantic-layer concepts
> remain external metadata? Or is the long-term direction for SQL measures to
> cover more of this modeling context?
> 
> For context only, here is a design note from my semantic modeling
> exploration. It is not a Calcite syntax proposal; it is only background for
> why I am thinking about this boundary:
> 
> [Designing a Semantic Modeling DSL][7]
> 
> I would appreciate feedback on this tradeoff: is `MEASURE` the right
> abstraction boundary for SQL, or should measures remain primarily a
> semantic-layer concept that compiles down to ordinary SQL?
> 
> References:
> 
> [1]: https://issues.apache.org/jira/browse/CALCITE-4496
> [2]: https://arxiv.org/pdf/2406.00251
> [3]: https://cloud.google.com/looker/docs/reference/param-explore
> [4]: https://cube.dev/docs/product/data-modeling/concepts
> [5]: https://docs.malloydata.dev/documentation
> [6]: https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl
> [7]:
> https://github.com/caicancai/semantic-modeling-dsl/blob/main/blog/designing-a-semantic-modeling-dsl-en.md

Re: [DISCUSS] Questions about SQL MEASURE as an abstraction boundary

Reply via email to