In my efforts to continue learning about the internals of Calcite, I've been experimenting with applying Calcite's Babel parser to the task of extracting fully qualified field references from a query. The use case here would be to enumerate all the fields of a schema that are "touched" by a given query, for the purpose of informing table owners about the usage by consumers of particular fields.
It seems there has been some prior interest in this use case, but the few of the stackoverflow posts I could find on the topic are likely outdated. One post that has proved helpful was this one from Julian <https://stackoverflow.com/a/37554628/215608>, which explains the roles of namespaces and scopes in the validation process. The comments in that thread mention SqlScopedShuttle only maintaining a stack of SqlNode, but it seems it has changed to keep a stack of SqlValidatorScopes? For the most part, I seem to be able to use that Deque to match identifiers to their tables, following on an example drawn from the getFieldOrigins method of SqlValidatorImpl <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlValidator.html#getFieldOrigins(org.apache.calcite.sql.SqlNode)>. My shuttle just uses the visit method for SqlIdentifier to try to qualify the identifier against the namespace associated with the current scope, discarding all that fail to be found. Here's my really naaive approach with some test cases expressed as yaml <https://gist.github.com/DeaconDesperado/416f11bf91d0ca60b3c30b22626f4178>. Obviously there's some cases that fail badly with this approach, one of which I've included wherein an UNNEST is used on an array field containing subrecords. A few questions I have: - Does this approach make sense for this use case, or would it make more sense to use an implementation of SqlValidator, which seems to maintain more state as to the scopes + namespaces? - Would it be easier to approach this particular use case from navigating the generated relational algebra rather than the SQL AST? - Does the static factory method for the shuttle make sense idiomatically with the intended usage for SqlScopedShuttle? How typically should one obtain the intialScope? I've tried to enumerate both the top level SqlSelect as well as the potential for a list of CTEs using WITH Many thanks!