[ 
https://issues.apache.org/jira/browse/KYLIN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382355#comment-15382355
 ] 

hongbin ma commented on KYLIN-1855:
-----------------------------------

Hi yanghong 

correct me if I'm wrong.

Suppose we have a fact table F and two lookup table X,Y
suppose F has 10000 rows, "F inner join X" produces 9000, and "F inner join X 
inner join Y" produces 8000
Now we have a model M, the join relationship is "F inner join X inner join Y".
Next we have a cube C, it contains only F and X.
When we build the cube C, currently the hive sql will contain ""F inner join X 
inner join Y", which is trying to keep all cubes in M having the same "star 
schema". so the "real fact table" for cube C will contain 8000 rows.

Now come the issue, when users queries "select count(*) from F inner join X", 
it will return 8000, which is wrong from a DBMS perspective. A smarter user 
will query "select count(*) from F inner join X inner join Y", however kylin 
will throw "no realization found"  because currently the CubeCapabilityChecker 
sees that cube C does not include table Y. that's the issue.

If we want to enforce all cubes under the same model to have uniformed star 
schema, i.e, each cube fully respect all the inner joins, we should relax the 
CubeCapabilityChecker. Otherwise, if we allow some cubes seeing a "real fact 
table" of 8000, and some seeing of 9000, then we need to merge yanghong's 
patch. To me, the first makes more sense.

> Should exclude those joins in whose related lookup tables no dimensions are 
> used in cube
> ----------------------------------------------------------------------------------------
>
>                 Key: KYLIN-1855
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1855
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>         Attachments: exclude_unused_joins.patch
>
>
> A cube is based on a model in which a star schema is defined. In some cases, 
> the cube utilizes only a few lookup tables rather than all. In this case, 
> when creating the sql for the flat table, those lookup tables should not be 
> included. Otherwise, it will confuse users when query. If users do query 
> according to the definition of the flat table, error of no realization will 
> occur due to lack of the related join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to