Re: Template SQL

2023-05-27 Thread Soumyadeep Mukhopadhyay
Hello Askar,

Thank you again for the replies! Hope all is well! :)

To answer your questions:
Question: Wanting to use Calcite (Calcite-server in particular) as a "proxy
DB" to
change incoming queries to have HAVING before passing the query to an
underlying database?
Answer: Yes.

Question: Just want to use Calcite to parse a SELECT query, and if it
doesn't have
a HAVING query, modify it and use it again?
Answer: Yes.

Question: In either case, what's your use case?
Answer: I want to apply certain constraints on what the user can actually
query from my database. Let's say I am providing a dataset to a user who
belongs to a different organization and I only want to expose a schema that
I am comfortable with via views. Whenever he or she is going to issue a
query I would want that query to be parsed and validated against the set of
constraints or rules that is important to my organisation but at the
same time I would not want to trigger an error just because someone did not
add a having clause (an example), rather I would try to modify the query
itself and then push the final query.

Question: `Is there a way to generate the RelBuilder version of the same
Sql query?
(SqlToRelConverter?) What do you mean by that?`
Answer: I was hoping there is some API which would produce a version of the
SQL statement which is similar to how one writes queries via RelBuilder.
My thoughts behind this question was there would be some kind of an API
which would convert "SELECT c1 FROM tbl1" into something like
builder .scan("tbl1") .project(builder.field("c1"))
.build(); and vice versa. I don't have an exact use case for this just yet
but what I was imagining to do is use something similar as a template
(let's say as a FreeMarker template) and then chain these templates,
probably via CTEs, and then produce their results (as a SQL string). This
way the developer only needs to worry about how the query would look in
this fashion and not worry much about how to rewrite a no-sql engine based
query into SQL, but I think there may not be too many use cases for that.

I understand and appreciate why "apply a similar approach to a RelRule"
might be an anti-pattern, but, for the sake of trying it out, I want to see
if it is even possible. I know you said you are not too familiar but if you
have any roadmap regarding how to implement any planner rule, please feel
free to chime in!

The following has been on my mind for sometime now but I could not find a
close enough answer for this, please feel free to point to anyone, or not,
if you do not wish to answer this. :)

And lets say I have a table from Snowflake and one in BigQuery, if I try to
join them both using Apache Calcite (the way I envision this would happen
is create a view within Calcite first, for each of these tables, and then
do the join) where will the computation happen? If the answer is somewhat
like this, it would happen in memory and wherever your application is
running, then I might need to build a framework which has the capability to
process the join (maybe over millions of rows) and in a distributed manner
but also leverages the work that has gone into writing the rules in Apache
Calcite. So should I consider some other engine or platform that supports
Apache Calcite like Apache Beam maybe (seems to me like although they use
Calcite I may not be able to deploy my own rules in that engine) or
Hazelcast? I am not sure but I am open to any and all suggestions!

I can not thank you enough for taking the time to hear my concerns, go
through my code and provide feedback. I am more than grateful to you! :)

With regards,
Soumyadeep Mukhopadhyay.

On Sat, May 27, 2023 at 12:05 AM Askar Bozcan  wrote:

> Hello again! Sorry for the late reply.
> Just to be on the same page, I have some questions because I am not 100%
> what you are trying to accomplish.
>
> Are you:
> a) Wanting to use Calcite (Calcite-server in particular) as a "proxy DB" to
> change incoming queries to have HAVING before passing the query to an
> underlying database?
> b) Just want to use Calcite to parse a SELECT query, and if it doesn't have
> a HAVING query, modify it and use it again?
>
> *In either case, what's your use case?*
>
> Regarding answers to some of your questions:
>
> Q: Is it not possible to add a node to a tree when the sql query has been
> parsed to RelNode after validation?
> A: You can, you can set the children of RelNode with
> RelNode.replaceInput (though
> the caveat here, is you're setting a whole sub-tree, so need to be careful
> there)
>
> Q: If I wish to apply a similar approach to a RelRule which I am trying to
> enforce, is that not expected? (an anti-pattern)
> A: Physical plan building is one part of Calcite I'm not too familiar with,
> however I'd argue that it's an antipattern, as those are used for physical
> planning. Physical plans are NOT meant to change the overall relation, and
> are for optimizing the query (adapting the query to run efficiently &
> correctly on differ

Implement a filter rule that adds a predicate

2023-05-27 Thread Soumyadeep Mukhopadhyay
Hello all,

*I was trying to implement a filter rule which can add an extra predicate*,
for example:
*look for GROUP BY clause in a query, check if HAVING is present, if not
add HAVING COUNT(*)>120.*

*I was trying to do the following: *
https://gist.github.com/Soumyadeep-github/47c1131bf02149af995d8e3bccb3ee67

*My approach in this gist *:
get a SqlNode (parsed query) ==> visit each SqlCall node recursively ==>
find SqlSelect (since there's getHaving and setHaving) ==> create a new
dummy query and extract having from the query ==>
set Having for the current query at hand via the dummy query.

*I was wondering if* the same could be achieved in a planner rule. It may
not be a good idea but, for the sake of trying it out, do you think it is
possible? If yes, could you please give me a rough road map.

Thank you for your time and patience!

With regards,
Soumyadeep Mukhopadhyay.