Hi Askar, I have gone through your email a couple of times. I am trying to understand it bit by bit.
I have a request, please feel free to say no, is there any code base where I can refer and understand how to implement my own filter rule? What I realised I asked wrong is I wanted to implement "(HAVING COUNT(*) > 100)" as a global rule, if there are any aggregation queries. It is a bit difficult for me to understand where I should start. For example, if I follow this https://github.com/zabetak/calcite-tutorial where shall I start looking at? In the below image I feel like some kind of comparison is going on, should I devise my code like this? (in order to implement something like "(HAVING COUNT(*) > 100)") [image: Screenshot 2023-04-11 at 5.38.43 PM.png] Also is there any documentation I can go through regarding how I can traverse through the AST? Probably I am not looking in the right places but so far I could only go through Tab9 code examples (or the documentation provided by the Calcite website) and things did not seem to be clear. I realise it is probably a lot to ask, so whatever you share will be a lot of help for me. Thanks again for your time, patience and help! With regards, Soumyadeep Mukhopadhyay. On Fri, Apr 7, 2023 at 8:33 PM Askar Bozcan <askar.mu...@gmail.com> wrote: > Hey, > You can use Planner.reset(). Note that it has mutable state inside, so do > not reuse the same planner instance in any multi-threading environment. > > (I am assuming you have access to table metadata so that you will be able > to convert your SqlNode tree into RelNode tree, relational expression tree) > - Only return results above a certain threshold when using GROUP BY, for > example (HAVING COUNT(col1) > 100). > > I'm not quite sure I understand this question is HAVING part of the query? > > - Restrict the column on which joins can happen, or else throw an error > (almost like analysisException in Spark) > > Do you have access to table/schema metadata? > If you do: > 1) Convert your parsed syntax tree (SqlNode) into a logical relational tree > (RelNode). > Watch this tutorial by Stamatis: > https://www.youtube.com/watch?v=p1O3E33FIs8. > It will explain way better the usage than I can in an email. (And it what > introduced me to Calcite's basics :)) > 2) Traverse your relational tree by implementing RelShuttle > < > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/RelShuttle.html > >. > (Look up Visitor pattern to better understand how this recursive traversal > works) > 3) Ignore any RelNode's which are not LogicalJoin. > 4) When you encounter LogicalJoin, traverse its children with getLeft and > getRight > 5) When you encounter LogicalTableScan, You can get its RelOptTable with > getTable, and then RelOptTable.getRowType() to find the fields. > (Not 100% about step 5, but should be close to it) > > > - Restrict the columns that can be passed in a select statement or else > throw an error (like in the point above) > Same logic as above. But instead of TableScan, look at the root > LogicalProject. > > All of the above should be doable with just the parse tree (SqlNode) and > without access to table metadata using SqlShuttle; however, it's easier and > less error-prone with relational tree IMO. > > - Askar > > > On Thu, 6 Apr 2023 at 23:31, Soumyadeep Mukhopadhyay < > soumyamy...@gmail.com> > wrote: > > > Hey Askar, > > > > It worked exactly how you suggested. > > > > ``` > > > > private val sqlQuery: String = "SELECT list_of_columns FROM table_name > > WHERE predicate_column = 'predicate_value'" > > private val frameworkConfig: FrameworkConfig = > > Frameworks.newConfigBuilder.build() > > private val planner: Planner = Frameworks.getPlanner(frameworkConfig) > > private val planner2: Planner = Frameworks.getPlanner(frameworkConfig) > > private val planner3: Planner = Frameworks.getPlanner(frameworkConfig) > > private val sqlNode: SqlNode = planner.parse(sqlQuery) > > println(sqlNode.getKind) > > private val sqlSelectStmt: SqlSelect = sqlNode.asInstanceOf[SqlSelect] > > > > private val setSelectColumnsQuery = "SELECT age" > > private val selectList = > > > planner2.parse(setSelectColumnsQuery).asInstanceOf[SqlSelect].getSelectList > > private val setFromTableQuery = "SELECT employee" > > private val fromTable = > > planner3.parse(setFromTableQuery).asInstanceOf[SqlSelect].getSelectList > > > > sqlSelectStmt.setSelectList(selectList) > > sqlSelectStmt.setFrom(fromTable) > > private val finalQuery = sqlSelectStmt.asInstanceOf[SqlNode] > > println(finalQuery.toSqlString(SnowflakeSqlDialect.DEFAULT)) > > > > ``` > > > > The only caveat I see is the need for a new planner for every new query. > > Should I do something else or is this expected? > > > > On a different note, I wanted to ask about how I can write my own rules. > Is > > it possible to enforce following rules: > > - Only return results above a certain threshold when using GROUP BY, for > > example (HAVING COUNT(col1) > 100). > > - Restrict the column on which joins can happen, or else throw an error > > (almost like analysisException in Spark) > > - Restrict the columns that can be passed in a select statement or else > > throw an error (like in the point above) > > > > Not sure if it is feasible, please feel free to suggest. :) Thanks again > > for your time! > > > > With regards, > > Soumyadeep Mukhopadhyay. > > > > > > On Sun, 2 Apr 2023 at 8:59 PM, Oscar Mulin <askar.mu...@gmail.com> > wrote: > > > > > Hey Soumyadeep, > > > I think that can work with a few caveats. > > > 0) Use the Planner from Frameworks > > > < > > > > > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/Frameworks.html > > > > > > > 1) Parse the “template query" into a syntax tree (a root SqlNode) but > do > > > not validate it > > > 2) Cast the root SqlNode into a SqlSelect > > > < > > > > > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSelect.html > > > > > > > (can > > > verify that it's a SELECT by using getKind()) > > > 3) Use the setters from SqlSelect to modify the children of root > > SqlSelect > > > node. > > > 4) Unparse the root query back into a string (SqlNode.unparse()). > > > > > > Now the tricky part is 3), as you have to set proper SqlNode types as > > > children of the SELECT node for column names, table names and for WHERE > > > predicate. I don't remember them properly, but what you can do is > parse a > > > proper query (again, don't validate it as you'll then need table > > metadata), > > > and check the kinds (getKind()) of children SqlNode's and replace them > > with > > > your replacements using setters of root SqlSelect node. > > > > > > This should work, but I'm not 100% certain as I'm unable to check right > > > now. > > > > > > PS: Take care about dialects. Dialects are used in unparsing as a > > > "configuration" of SqlWriter, and can unparse the syntax tree > differently > > > based on the dialect you have chosen. > > > > > > Kind regards, > > > Askar Bozcan > > > > > > On 2 Apr 2023, at 14:56, Soumyadeep Mukhopadhyay < > soumyamy...@gmail.com> > > > wrote: > > > > > > Hello All, > > > > > > I have just heard of Apache Calcite and was exploring the > possibilities. > > I > > > wish to achieve the following, and wanted to check if my hunch is > > correct: > > > - Use a template to build SQL queries, like use jinja-sql or even > pebble > > > (interpret the SqlNode tree kind of structure from my template and then > > add > > > the necessary fields like table name and group by fields from an input) > > > > > > So what I am expecting is "SELECT ? FROM ? WHERE ?" would be inside a > > query > > > template (in Jinja-sql it may look like "SELECT {{select_fields | > > sqlsafe}} > > > FROM {{table_name | sqlsafe}} WHERE {{where_clause | sqlsafe}}" and > > values > > > like 'select_fields' would be substituted at run-time from a processing > > > engine like https://pypi.org/project/Jinja2/, but for Java) and the > > output > > > would be "SELECT col1 FROM table1 WHERE col1 IS NOT NULL" with some > > dialect > > > (like Snowflake or Big-Query). > > > > > > Is this possible? Any recommendations or suggestions are welcome. Even > if > > > the approach feels wrong please let me know. :) > > > Thank you for your time and consideration. > > > > > > With regards, > > > Soumyadeep Mukhopadhyay. > > > > > >