Re: Simplify Griffin-DSL implementation

Grant Tue, 29 Jan 2019 15:11:09 -0800

We could have a SQL syntax checker using the existing parser logic,

Once it detects the SQL expression with the DSL type "griffin-dsl", it
could take the following steps
1. attempt to delegate the execution of the rule to "spark-sql" type
directly. Whether the execution is successful or not, run the step 2
2. notify the user to use "spark-sql" in the future


We only keep the checker in the distribution only for several
releases(say,2 or 3). And then we remove it.

Another thing I am thinking is we should consider to support UDF provided
by the end users.

On Tue, Jan 29, 2019 at 5:35 PM Nick Sokolov <chemika...@gmail.com> wrote:

> I think we need to maintain backward compatibility or provide easy
> (automated?) migration -- otherwise existing users will be stuck in older
> versions.
>
> On Tue, Jan 29, 2019 at 2:28 PM William Guo <gu...@apache.org> wrote:
>
> > Thanks Grant.
> >
> > I agree Griffin-DSL should leverage spark-sql for sql part , and
> > Griffin-DSL should work as DQ layer to assemble different dimensions as
> > MLlib does.
> > Since we already have some experiences for data quality domain, it is now
> > for Griffin-DSL to evolve to next level.
> >
> > Thanks,
> > William
> >
> >
> > On Wed, Jan 30, 2019 at 5:48 AM Grant <grant.xu...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I would suggest simplifying Griffin-DSL.
> > >
> > > Currently, Griffin supports three types of DSL: spark-sql, griffin-dsl
> > and
> > > df-ops respectively. In this proposal, I only focus on the first two.
> > >
> > > Griffin-DSL is a SQL-like language, supporst a wide range of clauses,
> key
> > > words, operators etc as Spark SQL. class "GriffinDslParser" also
> defines
> > > how to parse the SQL-like syntax. Actually, Griffin-DSL's SQL-like
> syntax
> > > could be covered by Spark SQL completely. Spark 2.0 substantially
> > improved
> > > SQL functionalities with SQL2003 support and can now run all 99 TPC-DS
> > > queries.
> > >
> > > So is it possible for Griffin-DSL to remove all SQL-like language
> > features?
> > > All rules, which could be expressed by SQL, would be categorized as
> > > "spark-sql" DSL type instead of "griffin-dsl". In this case, we could
> > > simplify the implementation of Griffin-DSL.
> > >
> > > For my understanding, Griffin-DSL should be the high-order expressions,
> > > each of them represents a specific set of semantics. Griffin-DSL
> > continues
> > > focusing on the expressions with the richer semantics in data
> exploration
> > > or wrangling area, and leaves all SQL compatible expressions to Spark
> > SQL.
> > > Griffin-DSL is still translated into Spark-SQL when being executed.
> > >
> > > here is an example from the unit test "_accuracy-batch-griffindsl.json"
> > >
> > > "evaluate.rule": {
> > >     "rules": [
> > >       {
> > >         "dsl.type": "griffin-dsl",
> > >         "dq.type": "accuracy",
> > >         "out.dataframe.name": "accu",
> > >         "rule": "source.user_id = target.user_id AND
> > > upper(source.first_name) = upper(target.first_name) AND
> source.last_name
> > =
> > > target.last_name AND source.address = target.address AND source.email =
> > > target.email AND source.phone = target.phone AND source.post_code =
> > > target.post_code",
> > >         "details": {
> > >           "source": "source",
> > >           "target": "target",
> > >           "miss": "miss_count",
> > >           "total": "total_count",
> > >           "matched": "matched_count"
> > >         },
> > >         "out":[
> > >           {
> > >             "type": "record",
> > >             "name": "missRecords"
> > >           }
> > >         ]
> > >       }
> > >     ]
> > >   }
> > >
> > >   If we move SQL-like syntax out of Griffin-DSL, the preceding example
> > will
> > > take "dsl.type" as "spark-sql", and "rule" would be probably a list of
> > > columns or all columns by default.
> > >
> > >   Discussions are welcomed.
> > >
> > > Grant
> > >
> >
>

Re: Simplify Griffin-DSL implementation

Reply via email to