Leonidas Fegaras created MRQL-33:
------------------------------------
Summary: Fix various bugs in iteration queries
Key: MRQL-33
URL: https://issues.apache.org/jira/browse/MRQL-33
Project: MRQL
Issue Type: Bug
Components: Query Optimization
Reporter: Leonidas Fegaras
Assignee: Leonidas Fegaras
This is a major patch that fixes many errors related to MRQL iteration queries
(repeat-queries) and optimizes matrix operations. Matrix factorization
(queries/factorization.mrql) is now highly optimized. Here are the changes:
1) New groupByJoin interface: Instead of a combiner and a mapper, it now uses
an accumulator with a left zero value. A groupByJoin is a join followed by a
groupBy that generalizes matrix multiplication. It is implemented using one
map-reduce only based on BSP Valiant's algorithm.
2) New algebraic optimization rules that generate groupByJoins.
3) Compiler was extended to compile functions on persistent collections
efficiently. A persistent collection is a Sequence file in map-reduce mode or
an RDD in spark mode. Now these persistent collections do not have to be
materialized in memory before function calls.
4) Global variable bindings are now passed as configuration parameters instead
of replacing the variable with the value in the code.
I am attaching the patch next.
--
This message was sent by Atlassian JIRA
(v6.2#6252)