We have a similar problem at LinkedIn where we need to analyze a large number 
of Pig scripts to find common subplans. Our approach is to parse those scripts 
and then convert into Calcite logical plans.
Then compare those plans to find common subtrees. I currently have an intern 
working on building basic library code for comparing RexNodes, RelNodes (like 
two projects with different column order are equal)
________________________________
From: Devjyoti Patra <devjyo...@qubole.com>
Sent: Wednesday, July 25, 2018 10:28 PM
To: dev@calcite.apache.org
Subject: Re: SQL Query Set Analyzer

Hi Zheng,

At Qubole, we are building something very similar to what you are looking
for. And from experience, I can tell you that it is a lot easy to build it
than what one may think.
We use Calcite parser to parse the SQL into Sqlnode and then use different
tree visitors to extract query attributes like  tables, filter columns,
joins, subqueries etc.,

Our approach is very similar to Uber's QueryParser project (
https://github.com/uber/queryparser ), but we go deeper in our analysis of
finding queries that are semantically similar to some canonicalized form.
If you intend to begin from scratch, I can give you some pointers to get
started.

Thanks,
Devjyoti


On Thu, Jul 26, 2018 at 9:37 AM, Zheng Shao <zsh...@gmail.com> wrote:

> Hi,
>
> We are thinking about starting a project to analyze huge number of SQL
> queries (think millions) to identify common patterns:
> * Common sub queries
> * Common filtering conditions (columns) for a table
> * Common join keys for table pairs
>
> Are there any existing projects on that direction using Calcite?  Would
> love to leverage instead of building from scratch.
>
> Zheng
>

Reply via email to