Re: [DISCUSS] Questions about SQL MEASURE as an abstraction boundary

Cancai Cai Thu, 14 May 2026 08:12:20 -0700

Thank you for your patient reply. I've realized that some semantics of `looker` 
are indeed not very reasonable; for example, `bind_filter` clearly violates the 
properties of closures. Regarding the other points you mentioned, I will 
reconsider them. Thank you very much! 🤔


Best wishes,
Cancai

On 2026/05/13 06:45:02 Julian Hyde wrote:
> Having worked with a few BI systems — including the MDX and Looker — I 
> concluded that measures are the key concept.  Dimensions, hierarchies, and 
> metadata like format-string and default sort order are useful but are 
> secondary.
> 
> Recall that the relational model is an “algebra” for data. It contains a 
> shape (the relation) that all data sets must conform to, and a set of 
> operations for building queries (thereby deriving data sets from data sets). 
> Importantly, it is closed: the output of a query has the shape of a relation, 
> and therefore we can have views (and views on views on views).
> 
> To add measures, one approach is to define a new model — a new shape for data 
> and new query operations. This is what MDX did, with cubes. The problem is 
> that the model is not closed (the output of a query does not look exactly 
> like a cube), and the query operations are not sufficient to build every 
> possible cube. You need a modeling/mapping language separate from the query 
> language. Also, MDX doesn’t possess essential operations, such as UNION and 
> DELETE, that would let you do other data activities like ETL and transactions.
> 
> Looker did something similar. Looker’s results look like tables, have a 
> little extra metadata (such as format strings) but are sufficiently close to 
> relations that they can be computed using a single SQL query, and can be used 
> as input to ETL. But Looker’s query language is incomplete. It doesn’t allow 
> you to create new measures, join in new tables, union results, or execute 
> subqueries like IN or EXISTS.
> 
> So, my proposal in the “SQL with Measures” paper is to add SQL to the 
> relational model. The basic data set is a relation that may be zero or more 
> measures. The query language is SQL, plus some operators to define measures 
> in queries and to execute calculations in a dimensional context. And it is 
> closed — the input relations may have measures, and the output relations may 
> have measures, and starting from a database (tables without measures) you can 
> create views that have measures by writing queries and storing them as views. 
> Just as in regular SQL, the views are not strictly necessary because you can 
> use CTEs in your queries.
> 
> I don’t like the term "semantic layer” because I don’t know what “semantic” 
> means and I don’t like the idea of an extra layer. If it is a layer separate 
> and above the relational database, then it is a new query language, and if 
> that language is not closed, then you can’t build views on views. Conversely, 
> if the layer is just the ability to define and use measures in queries then 
> it isn’t a layer at all. Once you’ve built your views, the resulting database 
> is still a collection of relations (which may or may not have measures). You 
> can build a bronze-silver-gold medallion architecture with layers of views, 
> add another layer of views for access control, and keep on going for as many 
> layers as you like. That’s the beauty of a closed modeling/query language.
> 
> > 1. Is a measure-bearing view still best understood as a relation, or as a 
> > relation plus semantic calculation metadata?
> 
> It’s a relation-with-measures. The measures can be evaluated (just like the 
> columns of a regular relational view). “Semantic calculation metadata” is too 
> grandiose a term — like any view, it is defined by a query, and the query 
> language includes can consume and produce measures, just as it can regular 
> columns.
> 
> > 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic 
> > metadata expanded before planning?
> 
> External to what? One day (hopefully soon) there will be databases where 
> measures can be part of view definitions. Until then, Calcite can have views 
> that contain measures, and those views can be expanded before a query is 
> generated against a back-end database. Dealing with measures is just 
> view-expansion, and multi-layer views can just be expanded one layer at a 
> time.
> 
> (Yes, expanding measures and dimensional calculations correctly is hard. But 
> it’s easier to do using RelNode-to-RelNode rewrite rules than any other 
> approach.)
> 
> > 3. If `MEASURE` lives in SQL, how broad should its evaluation context 
> > become?
> 
> I explained in the paper the idea that the evaluation context of a measure is 
> a predicate that determines which rows of its table go into the aggregate 
> function(s). If I want to calculate year-over-year revenue for California I 
> would evaluate the revenue measure twice, once with “WHERE YEAR(orderDate) = 
> 2026 AND state = ‘CA’”, and once with “WHERE YEAR(orderDate) = 2025 AND state 
> = ‘CA’”.
> 
> You mention filter-only-fields, which makes me suspect that you are reading 
> Looker’s specification. Looker has a few features that break the closure 
> principle, so we should not attempt to implement those. For example, an 
> explore might apply different filters internally based on which fields of the 
> explore were are being projected. Though those features are useful for people 
> building fancy dashboards they have no place in a query language.
> 
> Some of Looker’s features could be created by adding attributes [ 
> https://issues.apache.org/jira/browse/CALCITE-6425 ] and parameter columns 
> (columns that have the same value for every row of the table, but can be 
> overridden in an AT clause).
> 
> Julian
> 
> > On May 11, 2026, at 5:56 AM, Cancai Cai <[email protected]> wrote:
> > 
> > Hi Calcite community,
> > 
> > I recently asked a related question on [CALCITE-4496][1]. I would like to
> > bring the discussion to the dev list because the question is not only about
> > one JIRA item or implementation detail, but about the abstraction boundary
> > of Measure in SQL itself.
> > 
> > Over the past quarter, I have been exploring semantic layer / semantic
> > modeling DSL design. During this process, I looked at adjacent systems and
> > ideas such as [Looker][3], [Cube][4], [Malloy][5], [dbt Semantic Layer /
> > MetricFlow][6], and Calcite’s measure work. These systems all point to the
> > same need: reusable business calculations and governed analytical models.
> > But they draw the boundary differently. Some keep measures in an external
> > semantic model; Calcite’s SQL measure work moves reusable calculations into
> > SQL itself.
> > 
> > That boundary is the part I would like to discuss.
> > 
> > I read [CALCITE-4496][1] and Julian Hyde / John Fremlin’s paper [“Measures
> > in SQL”][2]. The core motivation makes sense to me. Traditional SQL views
> > cannot preserve reusable aggregate calculations very well. Once a view
> > turns `AVG`, ratios, margins, or similar calculations into ordinary
> > columns, downstream roll-ups can lose the original calculation semantics.
> > The classic example is “average of averages”.
> > 
> > `MEASURE` tries to solve this by making a measure column not just a
> > materialized value, but a calculation that can be re-evaluated in the outer
> > query context. This is a powerful idea.
> > 
> > My question is about where this abstraction should live.
> > 
> > In traditional SQL, a view can be understood as a relation. It produces
> > rows and columns; columns are values. Even if the view definition is
> > complex, downstream queries can treat it as a black-box relation.
> > 
> > A measure-bearing view is different. Some columns are regular row values,
> > while some columns are deferred aggregate calculations that can be
> > re-evaluated in an outer query context. The view is no longer only a
> > relation in the traditional value-level sense; it also carries reusable
> > calculation semantics.
> > 
> > That leads to a few questions.
> > 
> > 1. Is a measure-bearing view still best understood as a relation, or as a
> > relation plus semantic calculation metadata?
> > 
> > In semantic modeling systems, dimensions and measures are different kinds
> > of objects. A dimension describes rows and can be selected, grouped,
> > filtered, or displayed. A measure describes how to compute an aggregate
> > over a set of rows, and its value depends on query context.
> > 
> > If SQL schema contains both regular columns and measure columns, users and
> > planners need to understand that they are not the same kind of column. Is
> > this the intended mental model for SQL tables with measures?
> > 
> > 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic
> > metadata expanded before planning?
> > 
> > For semantic layers built on Calcite, there seem to be two possible
> > directions:
> > 
> > ```text
> > semantic model -> SQL with MEASURE -> Calcite expands / optimizes measures
> > ```
> > 
> > or:
> > 
> > ```text
> > semantic model -> ordinary SQL -> Calcite optimizes relational SQL
> > ```
> > 
> > The first direction gives SQL a native reusable-calculation abstraction.
> > The second keeps SQL closer to the traditional relational model, and leaves
> > semantic complexity in the modeling layer.
> > 
> > I am trying to understand which direction better matches Calcite’s
> > long-term goal.
> > 
> > 3. If `MEASURE` lives in SQL, how broad should its evaluation context
> > become?
> > 
> > The paper defines measures using context-sensitive expressions, evaluation
> > context, and the `AT` operator. That model is expressive.
> > 
> > In real semantic layer systems, query context often includes more than
> > `GROUP BY`. It may include:
> > 
> > - filters before aggregation vs filters after aggregation;
> > - joins that may change grain;
> > - time dimensions, time grain, and timezone rules;
> > - hidden dimensions or filter-only fields;
> > - policy filters and query parameters;
> > - derived views that preserve or reshape dimensionality.
> > 
> > Should SQL `MEASURE` stay focused on reusable aggregate calculations and
> > their evaluation context, while these broader semantic-layer concepts
> > remain external metadata? Or is the long-term direction for SQL measures to
> > cover more of this modeling context?
> > 
> > For context only, here is a design note from my semantic modeling
> > exploration. It is not a Calcite syntax proposal; it is only background for
> > why I am thinking about this boundary:
> > 
> > [Designing a Semantic Modeling DSL][7]
> > 
> > I would appreciate feedback on this tradeoff: is `MEASURE` the right
> > abstraction boundary for SQL, or should measures remain primarily a
> > semantic-layer concept that compiles down to ordinary SQL?
> > 
> > References:
> > 
> > [1]: https://issues.apache.org/jira/browse/CALCITE-4496
> > [2]: https://arxiv.org/pdf/2406.00251
> > [3]: https://cloud.google.com/looker/docs/reference/param-explore
> > [4]: https://cube.dev/docs/product/data-modeling/concepts
> > [5]: https://docs.malloydata.dev/documentation
> > [6]: https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl
> > [7]:
> > https://github.com/caicancai/semantic-modeling-dsl/blob/main/blog/designing-a-semantic-modeling-dsl-en.md
> 
>

Re: [DISCUSS] Questions about SQL MEASURE as an abstraction boundary

Reply via email to