I'm hitting an issue and am not able to understand clearly what the root
cause is.  

I have customer,time dimension and a transaction fact table.  Customer table
has 1MM rows and transaction fact 154MM.

I have to find topN customers by transaction volume for a year. I ran the
following query in Hive for say year 2015.

TXN_AMT in transaction fact table is of type decimal(32,8)

SELECT
  CUSTOMER_DIM_T.NM, SUM(TXN_AMT) TOTAL_TXN_AMT  
FROM TXN_FCT_T as TXN_FCT_T 
INNER JOIN CUSTOMER_DIM_T as CUSTOMER_DIM_T
ON TXN_FCT_T.CST_KEY = CUSTOMER_DIM_T.KEY
INNER JOIN DATE_DIM_T as DATE_DIM_T
ON TXN_FCT_T.TXN_BK_DT_KEY = DATE_DIM_T.DT_KEY
WHERE DATE_DIM_T.CAL_YR=2015
GROUP BY
CUSTOMER_DIM_T.NM
ORDER BY TOTAL_TXN_AMT DESC LIMIT 10;

CUST    AMT
====================
A     177070809652.52
B     156918629669.59
C     145634838958.87
D     137781561987.28
E     137470272887.12
F     136782827759.93
G     129986897552.65
H     127105433950.33
I     115152934891.42
J     107505491520.64

The same query if run in Kylin, I get a totally different answer what is
bothering me is as I begin to increase the LIMIT number, the response keeps
on changing.  I keep on getting a different set of customers with higher txn
amts.

CUST    AMT
====================
K     4014698676
L     2727082526
M     1344354210
N     1216966910
O     554963079.2
P     390827714.4
Q     367123639.6
R     347732036.1
S     313686532
T     311987500.6

Aren't the results first sorted then limited in Kylin?

If I run the query in Kylin filtering on customer A above, the cuboid seems
to have the right aggregated data

SELECT
  CUSTOMER_DIM_T.NM, SUM(TXN_AMT) TOTAL_TXN_AMT  
FROM TXN_FCT_T as TXN_FCT_T 
INNER JOIN CUSTOMER_DIM_T as CUSTOMER_DIM_T
ON TXN_FCT_T.CST_KEY = CUSTOMER_DIM_T.KEY
INNER JOIN DATE_DIM_T as DATE_DIM_T
ON TXN_FCT_T.TXN_BK_DT_KEY = DATE_DIM_T.DT_KEY
WHERE DATE_DIM_T.CAL_YR=2015 and */CUSTOMER_DIM_T.NM='A'/*
GROUP BY
CUSTOMER_DIM_T.NM
ORDER BY TOTAL_TXN_AMT DESC LIMIT 10;


CUST    AMT
====================
A     177070809652.52

Any ideas?

Thanks,

--
View this message in context: 
http://apache-kylin.74782.x6.nabble.com/TopN-Results-Differ-in-Hive-and-Kylin-tp3288.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Reply via email to