Thank you for your patient reply. I've realized that some semantics of `looker` are indeed not very reasonable; for example, `bind_filter` clearly violates the properties of closures. Regarding the other points you mentioned, I will reconsider them. Thank you very much! đ¤
Best wishes, Cancai On 2026/05/13 06:45:02 Julian Hyde wrote: > Having worked with a few BI systems â including the MDX and Looker â I > concluded that measures are the key concept. Dimensions, hierarchies, and > metadata like format-string and default sort order are useful but are > secondary. > > Recall that the relational model is an âalgebraâ for data. It contains a > shape (the relation) that all data sets must conform to, and a set of > operations for building queries (thereby deriving data sets from data sets). > Importantly, it is closed: the output of a query has the shape of a relation, > and therefore we can have views (and views on views on views). > > To add measures, one approach is to define a new model â a new shape for data > and new query operations. This is what MDX did, with cubes. The problem is > that the model is not closed (the output of a query does not look exactly > like a cube), and the query operations are not sufficient to build every > possible cube. You need a modeling/mapping language separate from the query > language. Also, MDX doesnât possess essential operations, such as UNION and > DELETE, that would let you do other data activities like ETL and transactions. > > Looker did something similar. Lookerâs results look like tables, have a > little extra metadata (such as format strings) but are sufficiently close to > relations that they can be computed using a single SQL query, and can be used > as input to ETL. But Lookerâs query language is incomplete. It doesnât allow > you to create new measures, join in new tables, union results, or execute > subqueries like IN or EXISTS. > > So, my proposal in the âSQL with Measuresâ paper is to add SQL to the > relational model. The basic data set is a relation that may be zero or more > measures. The query language is SQL, plus some operators to define measures > in queries and to execute calculations in a dimensional context. And it is > closed â the input relations may have measures, and the output relations may > have measures, and starting from a database (tables without measures) you can > create views that have measures by writing queries and storing them as views. > Just as in regular SQL, the views are not strictly necessary because you can > use CTEs in your queries. > > I donât like the term "semantic layerâ because I donât know what âsemanticâ > means and I donât like the idea of an extra layer. If it is a layer separate > and above the relational database, then it is a new query language, and if > that language is not closed, then you canât build views on views. Conversely, > if the layer is just the ability to define and use measures in queries then > it isnât a layer at all. Once youâve built your views, the resulting database > is still a collection of relations (which may or may not have measures). You > can build a bronze-silver-gold medallion architecture with layers of views, > add another layer of views for access control, and keep on going for as many > layers as you like. Thatâs the beauty of a closed modeling/query language. > > > 1. Is a measure-bearing view still best understood as a relation, or as a > > relation plus semantic calculation metadata? > > Itâs a relation-with-measures. The measures can be evaluated (just like the > columns of a regular relational view). âSemantic calculation metadataâ is too > grandiose a term â like any view, it is defined by a query, and the query > language includes can consume and produce measures, just as it can regular > columns. > > > 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic > > metadata expanded before planning? > > External to what? One day (hopefully soon) there will be databases where > measures can be part of view definitions. Until then, Calcite can have views > that contain measures, and those views can be expanded before a query is > generated against a back-end database. Dealing with measures is just > view-expansion, and multi-layer views can just be expanded one layer at a > time. > > (Yes, expanding measures and dimensional calculations correctly is hard. But > itâs easier to do using RelNode-to-RelNode rewrite rules than any other > approach.) > > > 3. If `MEASURE` lives in SQL, how broad should its evaluation context > > become? > > I explained in the paper the idea that the evaluation context of a measure is > a predicate that determines which rows of its table go into the aggregate > function(s). If I want to calculate year-over-year revenue for California I > would evaluate the revenue measure twice, once with âWHERE YEAR(orderDate) = > 2026 AND state = âCAââ, and once with âWHERE YEAR(orderDate) = 2025 AND state > = âCAââ. > > You mention filter-only-fields, which makes me suspect that you are reading > Lookerâs specification. Looker has a few features that break the closure > principle, so we should not attempt to implement those. For example, an > explore might apply different filters internally based on which fields of the > explore were are being projected. Though those features are useful for people > building fancy dashboards they have no place in a query language. > > Some of Lookerâs features could be created by adding attributes [ > https://issues.apache.org/jira/browse/CALCITE-6425 ] and parameter columns > (columns that have the same value for every row of the table, but can be > overridden in an AT clause). > > Julian > > > On May 11, 2026, at 5:56âŻAM, Cancai Cai <[email protected]> wrote: > > > > Hi Calcite community, > > > > I recently asked a related question on [CALCITE-4496][1]. I would like to > > bring the discussion to the dev list because the question is not only about > > one JIRA item or implementation detail, but about the abstraction boundary > > of Measure in SQL itself. > > > > Over the past quarter, I have been exploring semantic layer / semantic > > modeling DSL design. During this process, I looked at adjacent systems and > > ideas such as [Looker][3], [Cube][4], [Malloy][5], [dbt Semantic Layer / > > MetricFlow][6], and Calciteâs measure work. These systems all point to the > > same need: reusable business calculations and governed analytical models. > > But they draw the boundary differently. Some keep measures in an external > > semantic model; Calciteâs SQL measure work moves reusable calculations into > > SQL itself. > > > > That boundary is the part I would like to discuss. > > > > I read [CALCITE-4496][1] and Julian Hyde / John Fremlinâs paper [âMeasures > > in SQLâ][2]. The core motivation makes sense to me. Traditional SQL views > > cannot preserve reusable aggregate calculations very well. Once a view > > turns `AVG`, ratios, margins, or similar calculations into ordinary > > columns, downstream roll-ups can lose the original calculation semantics. > > The classic example is âaverage of averagesâ. > > > > `MEASURE` tries to solve this by making a measure column not just a > > materialized value, but a calculation that can be re-evaluated in the outer > > query context. This is a powerful idea. > > > > My question is about where this abstraction should live. > > > > In traditional SQL, a view can be understood as a relation. It produces > > rows and columns; columns are values. Even if the view definition is > > complex, downstream queries can treat it as a black-box relation. > > > > A measure-bearing view is different. Some columns are regular row values, > > while some columns are deferred aggregate calculations that can be > > re-evaluated in an outer query context. The view is no longer only a > > relation in the traditional value-level sense; it also carries reusable > > calculation semantics. > > > > That leads to a few questions. > > > > 1. Is a measure-bearing view still best understood as a relation, or as a > > relation plus semantic calculation metadata? > > > > In semantic modeling systems, dimensions and measures are different kinds > > of objects. A dimension describes rows and can be selected, grouped, > > filtered, or displayed. A measure describes how to compute an aggregate > > over a set of rows, and its value depends on query context. > > > > If SQL schema contains both regular columns and measure columns, users and > > planners need to understand that they are not the same kind of column. Is > > this the intended mental model for SQL tables with measures? > > > > 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic > > metadata expanded before planning? > > > > For semantic layers built on Calcite, there seem to be two possible > > directions: > > > > ```text > > semantic model -> SQL with MEASURE -> Calcite expands / optimizes measures > > ``` > > > > or: > > > > ```text > > semantic model -> ordinary SQL -> Calcite optimizes relational SQL > > ``` > > > > The first direction gives SQL a native reusable-calculation abstraction. > > The second keeps SQL closer to the traditional relational model, and leaves > > semantic complexity in the modeling layer. > > > > I am trying to understand which direction better matches Calciteâs > > long-term goal. > > > > 3. If `MEASURE` lives in SQL, how broad should its evaluation context > > become? > > > > The paper defines measures using context-sensitive expressions, evaluation > > context, and the `AT` operator. That model is expressive. > > > > In real semantic layer systems, query context often includes more than > > `GROUP BY`. It may include: > > > > - filters before aggregation vs filters after aggregation; > > - joins that may change grain; > > - time dimensions, time grain, and timezone rules; > > - hidden dimensions or filter-only fields; > > - policy filters and query parameters; > > - derived views that preserve or reshape dimensionality. > > > > Should SQL `MEASURE` stay focused on reusable aggregate calculations and > > their evaluation context, while these broader semantic-layer concepts > > remain external metadata? Or is the long-term direction for SQL measures to > > cover more of this modeling context? > > > > For context only, here is a design note from my semantic modeling > > exploration. It is not a Calcite syntax proposal; it is only background for > > why I am thinking about this boundary: > > > > [Designing a Semantic Modeling DSL][7] > > > > I would appreciate feedback on this tradeoff: is `MEASURE` the right > > abstraction boundary for SQL, or should measures remain primarily a > > semantic-layer concept that compiles down to ordinary SQL? > > > > References: > > > > [1]: https://issues.apache.org/jira/browse/CALCITE-4496 > > [2]: https://arxiv.org/pdf/2406.00251 > > [3]: https://cloud.google.com/looker/docs/reference/param-explore > > [4]: https://cube.dev/docs/product/data-modeling/concepts > > [5]: https://docs.malloydata.dev/documentation > > [6]: https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl > > [7]: > > https://github.com/caicancai/semantic-modeling-dsl/blob/main/blog/designing-a-semantic-modeling-dsl-en.md > >
