Thanks for all the helpful answers. I will check out vxquery next. Algebricks seems like the equivalent of LLVM for query languages. I am wondering if Algebricks is powerful enough to map any query language, be it graph-based, relational or hierarchical (mrql, sql, pregel). Is there a formal proof of this expressive power ? Will there always be a one-to-one correspondence between the plan trees of different languages or would there be a case where one would have to expand to look at sub-trees while doing query translation ?
-Sandeep On Mon, Feb 15, 2016 at 4:21 AM, Mike Carey <[email protected]> wrote: > PS: There's an important point below that you shouldn't miss (Sandeep) if > you look at the Hivesterix code - if you find its approach puzzling, note > that it was designed to only add what was needed to run Hive queries on > Hyracks - and so that it could potentially be kept in upper-level sync with > Hive itself. As a result, it was not done as a "Hive lookalike done right" > - it was done as a "Hive lookalike that lets the existing Hive code do as > much of the initial work as possible". > > > > On 2/14/16 2:48 PM, Yingyi Bu wrote: > >> Hi Sandeep, >> >> Here is the Hivesterix codebase in the Apache source tree: >> >> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13 >> >> We have maintained Hivesterix up to hyracks-0.2.13, but stopped >> maintaining >> after that release. Mike has elaborated the reason. >> >> Furthermore, none of these rewrite rules seem to be SQL-specific. Are >>>> there >>>> >>> any SQL-specific rewrite rules which were added? >> That's exactly the motivation of the Algebricks project --- most rules >> that >> a typical SQL compiler implemented are not SQL-specific:-) >> However, there indeed are few Hive-specific rules that I added in order to >> get the Hive-on-Algebricks plan work efficiently: >> >> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13/hivesterix/hivesterix-optimizer/src/main/java/edu/uci/ics/hivesterix/optimizer/rules >> >> The Hivesterix implementation first translates a Hive-optimized MR plan >> into an Algebricks logical plan, and then let Algebricks do further >> optimizations and finally execute the resulting Hyracks job on the Hyracks >> runtime. >> >> Best, >> Yingyi >> >> >> >> On Sun, Feb 14, 2016 at 2:26 PM, Mike Carey <[email protected]> wrote: >> >> Sandeep, >>> >>> Just to chime in as well: >>> >>> - VXQuery is indeed the best example to look at, probably, to >>> understand >>> the AsterixDB/Algebricks separation. >>> >>> - Hivesterix was built by Yingyi Bu (who'll see this) early on - it >>> drove >>> the separation idea, actually, but we made a decision not to try and >>> maintain it. It was intended to provide a third/different proof of >>> separation and applicability of the approach, from a research standpoint, >>> but doesn't have additional value to offer the world (since Hive itself >>> is >>> a moving target and Hive on Tez now provides the non-MapReduce-runtime >>> value that Hivesterix initially offered). Yingi would probably be happy >>> to >>> share the code base with you if you wanted to look at it for any reason, >>> but the only things in the Apache AsterixDB (incubating) project are >>> things >>> deemed worthy of engineering/maintenance work. >>> >>> Hope that helps too! >>> >>> Cheers, >>> Mike >>> >>> >>> >>> On 2/14/16 11:47 AM, Till Westmann wrote: >>> >>> Hi Sandeep, >>>> >>>> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper, >>>> is >>>> a separate project [1]. >>>> >>>> Specifically to your questions: >>>> >>>> 1) There is no need to implement other projects that use Algebricks >>>> inside of the AsterixDB source tree (as VXQuery shows). >>>> >>>> 2) It is clearly easier to combine a Java parser and plan tree generator >>>> with Algebricks, but there's no reason why one couldn't connect to other >>>> languages (e.g. by using a text-based intermediate format between the >>>> parser and the optimizer and between the plan generator and the >>>> runtime). >>>> >>>> 3) The reason for the different set of rules is that some are language >>>> agnostic and some are language-specific. As you can see in figure 2 of >>>> the >>>> paper a language implementation has to provide language-specific rules >>>> to >>>> augment the language-agnostic rules provided by Algebricks. >>>> Specifically, the rules in AsterixDB's asterix-algebra project augment >>>> the rules in Algebricks to support AsterixDB's query language AQL. >>>> >>>> Hope this helps, >>>> Till >>>> >>>> [1] http://vxquery.apache.org >>>> >>>> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote: >>>> >>>> I had some questions about the process of mapping other query languages >>>> to >>>> >>>>> Algebricks. The Sigmod SoCC 15 paper mentions that two languages >>>>> XQuery >>>>> and HiveQL which have been mapped to Algebricks, but the implementation >>>>> is >>>>> not found in either of the two repositories released under Apache. >>>>> >>>>> I found Hivesterix and Pregelix under >>>>> >>>>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix >>>>> >>>>> I couldn't find the XQuery to Algebricks translator anywhere. Has this >>>>> been released ? >>>>> >>>>> What is the reason these language translators are not part of the >>>>> Apache >>>>> repository ? >>>>> >>>>> The Apache repositories contain the language translators for AQL and >>>>> SQL. >>>>> After comparing the implementations for Hivesterix and SQL/AQL, here >>>>> are >>>>> some questions >>>>> >>>>> 1) Does one have to integrate the parser for a new language within the >>>>> Apache AsterixDB source tree, or can one build the Algebricks >>>>> translator >>>>> outside the Apache tree and invoke the Hyracks job execution engine >>>>> directly, as is being done in the hivesterix implementation seen here. >>>>> >>>>> >>>>> >>>>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java >>>>> >>>>> 2) When a query language is converted to Algebricks, the >>>>> ICompilerFactory >>>>> converts one plan tree to another by calling Visitor::visit() on each >>>>> node >>>>> of the source query. Does this imply that the plan tree for the source >>>>> language can only be constructed in Java ? Would it be >>>>> difficult/impossible to integrate a parser and plan tree generator >>>>> which >>>>> was written in any language into Algebricks ? >>>>> >>>>> 3) In the Apache repositories, the query rewrite rules which are used >>>>> during optimization are found under two different repositories. >>>>> >>>>> One in main asterixdb repository >>>>> >>>>> >>>>> >>>>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules >>>>> >>>>> and the other in the hyracks repository >>>>> >>>>> >>>>> >>>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules >>>>> >>>>> Are these two sets of rules characteristically different or is this >>>>> duplication just an artifact of rapid prototyping ? >>>>> >>>>> Furthermore, none of these rewrite rules seem to be SQL-specific. Are >>>>> there any SQL-specific rewrite rules which were added ? >>>>> >>>>> -Sandeep >>>>> >>>>> >
