We could have a SQL syntax checker using the existing parser logic, Once it detects the SQL expression with the DSL type "griffin-dsl", it could take the following steps 1. attempt to delegate the execution of the rule to "spark-sql" type directly. Whether the execution is successful or not, run the step 2 2. notify the user to use "spark-sql" in the future
We only keep the checker in the distribution only for several releases(say,2 or 3). And then we remove it. Another thing I am thinking is we should consider to support UDF provided by the end users. On Tue, Jan 29, 2019 at 5:35 PM Nick Sokolov <chemika...@gmail.com> wrote: > I think we need to maintain backward compatibility or provide easy > (automated?) migration -- otherwise existing users will be stuck in older > versions. > > On Tue, Jan 29, 2019 at 2:28 PM William Guo <gu...@apache.org> wrote: > > > Thanks Grant. > > > > I agree Griffin-DSL should leverage spark-sql for sql part , and > > Griffin-DSL should work as DQ layer to assemble different dimensions as > > MLlib does. > > Since we already have some experiences for data quality domain, it is now > > for Griffin-DSL to evolve to next level. > > > > Thanks, > > William > > > > > > On Wed, Jan 30, 2019 at 5:48 AM Grant <grant.xu...@gmail.com> wrote: > > > > > Hi all, > > > > > > I would suggest simplifying Griffin-DSL. > > > > > > Currently, Griffin supports three types of DSL: spark-sql, griffin-dsl > > and > > > df-ops respectively. In this proposal, I only focus on the first two. > > > > > > Griffin-DSL is a SQL-like language, supporst a wide range of clauses, > key > > > words, operators etc as Spark SQL. class "GriffinDslParser" also > defines > > > how to parse the SQL-like syntax. Actually, Griffin-DSL's SQL-like > syntax > > > could be covered by Spark SQL completely. Spark 2.0 substantially > > improved > > > SQL functionalities with SQL2003 support and can now run all 99 TPC-DS > > > queries. > > > > > > So is it possible for Griffin-DSL to remove all SQL-like language > > features? > > > All rules, which could be expressed by SQL, would be categorized as > > > "spark-sql" DSL type instead of "griffin-dsl". In this case, we could > > > simplify the implementation of Griffin-DSL. > > > > > > For my understanding, Griffin-DSL should be the high-order expressions, > > > each of them represents a specific set of semantics. Griffin-DSL > > continues > > > focusing on the expressions with the richer semantics in data > exploration > > > or wrangling area, and leaves all SQL compatible expressions to Spark > > SQL. > > > Griffin-DSL is still translated into Spark-SQL when being executed. > > > > > > here is an example from the unit test "_accuracy-batch-griffindsl.json" > > > > > > "evaluate.rule": { > > > "rules": [ > > > { > > > "dsl.type": "griffin-dsl", > > > "dq.type": "accuracy", > > > "out.dataframe.name": "accu", > > > "rule": "source.user_id = target.user_id AND > > > upper(source.first_name) = upper(target.first_name) AND > source.last_name > > = > > > target.last_name AND source.address = target.address AND source.email = > > > target.email AND source.phone = target.phone AND source.post_code = > > > target.post_code", > > > "details": { > > > "source": "source", > > > "target": "target", > > > "miss": "miss_count", > > > "total": "total_count", > > > "matched": "matched_count" > > > }, > > > "out":[ > > > { > > > "type": "record", > > > "name": "missRecords" > > > } > > > ] > > > } > > > ] > > > } > > > > > > If we move SQL-like syntax out of Griffin-DSL, the preceding example > > will > > > take "dsl.type" as "spark-sql", and "rule" would be probably a list of > > > columns or all columns by default. > > > > > > Discussions are welcomed. > > > > > > Grant > > > > > >