[
https://issues.apache.org/jira/browse/LENS-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743967#comment-15743967
]
Puneet Gupta commented on LENS-1381:
------------------------------------
Design for this requirement :
*Current rewrite flow*
- Currently the rewrite flow relies on Set<CandidateFact> and
Set<Set<CandidateFact>> which represents the participating Facts and
combination of Facts(in case of join between 2 or more facts) that can answer
the user query respectively.
- Set<CandidateFact> is initially populated considering all the Facts will
participate and the Set<Set<CandidateFact>> is created based on joins that are
required to answer the query (with assumption that two two facts can be joined
if they have the dimensions that are being queried by the user. After joining
the facts, the queried measures which are split across facts are picked). Along
the rewrite flow the above data structures are pruned based on column
availability, data availability, storage validity, fact validity, cost,etc. In
the last a final CandidateFact combination is picked from
Set<Set<CandidateFact>>.
- To write the rewritten query for the picked candidate combination, one of the
following contexts are created
-- SingleFactSingleStorageHQLContext or (Candidate combination has single fact
and single storage)
-- SingleFactMultiStorageHQLContext or (Candidate combination has single fact
and multiple storages within that fact - Union Query)
-- MultiFactHQLContext (Candidate combination has multiple facts - Join Query)
*New Flow*
# The new flow will work at Storage level and will use a list of
StorageCandidates. Initially all Storages are candidates.
# The list of StorageCandidates is pruned based on column availability, storage
validity, fact validity, update period validity,etc
# The StorageCandidates are then grouped to ensure that a group can cover the
entire time range queried by the user. Its possible for a group to have a
single StorageCandidate incase this storage alone can fulfill the time ranges
queried. If a group has more that one storages , then this group is represented
as a UnionCandidate.
# The groups created in step 3 ( UnionCandidates and StorageCandidates) are
used to find a measure covering group such that members of this group cover all
the measures queried by the user. Again its possible for this group to have a
single member (which can be a StorageCandidate or a UnionCandidate) that can
answer all the measures. If the group has more than one members, then that
group is represented as a JoinCandidate
# JoinCandidate, UnionCandidate and StorageCandidate extend the same Candidate
Interface.
# The groups created in step 4 are further pruned based on data availability,
cost ,etc we pick a winning group (Candidate)
# Query is then written for this winning Candidate
> Support Fact to Fact Union
> --------------------------
>
> Key: LENS-1381
> URL: https://issues.apache.org/jira/browse/LENS-1381
> Project: Apache Lens
> Issue Type: New Feature
> Reporter: Puneet Gupta
>
> Currently Lens supports Union-ing data across different storages in a single
> Fact. With this JIRA Lens server will be able to Union Data Across Facts too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)