I'm hitting an issue and am not able to understand clearly what the root cause is.
I have customer,time dimension and a transaction fact table. Customer table has 1MM rows and transaction fact 154MM. I have to find topN customers by transaction volume for a year. I ran the following query in Hive for say year 2015. TXN_AMT in transaction fact table is of type decimal(32,8) SELECT CUSTOMER_DIM_T.NM, SUM(TXN_AMT) TOTAL_TXN_AMT FROM TXN_FCT_T as TXN_FCT_T INNER JOIN CUSTOMER_DIM_T as CUSTOMER_DIM_T ON TXN_FCT_T.CST_KEY = CUSTOMER_DIM_T.KEY INNER JOIN DATE_DIM_T as DATE_DIM_T ON TXN_FCT_T.TXN_BK_DT_KEY = DATE_DIM_T.DT_KEY WHERE DATE_DIM_T.CAL_YR=2015 GROUP BY CUSTOMER_DIM_T.NM ORDER BY TOTAL_TXN_AMT DESC LIMIT 10; CUST AMT ==================== A 177070809652.52 B 156918629669.59 C 145634838958.87 D 137781561987.28 E 137470272887.12 F 136782827759.93 G 129986897552.65 H 127105433950.33 I 115152934891.42 J 107505491520.64 The same query if run in Kylin, I get a totally different answer what is bothering me is as I begin to increase the LIMIT number, the response keeps on changing. I keep on getting a different set of customers with higher txn amts. CUST AMT ==================== K 4014698676 L 2727082526 M 1344354210 N 1216966910 O 554963079.2 P 390827714.4 Q 367123639.6 R 347732036.1 S 313686532 T 311987500.6 Aren't the results first sorted then limited in Kylin? If I run the query in Kylin filtering on customer A above, the cuboid seems to have the right aggregated data SELECT CUSTOMER_DIM_T.NM, SUM(TXN_AMT) TOTAL_TXN_AMT FROM TXN_FCT_T as TXN_FCT_T INNER JOIN CUSTOMER_DIM_T as CUSTOMER_DIM_T ON TXN_FCT_T.CST_KEY = CUSTOMER_DIM_T.KEY INNER JOIN DATE_DIM_T as DATE_DIM_T ON TXN_FCT_T.TXN_BK_DT_KEY = DATE_DIM_T.DT_KEY WHERE DATE_DIM_T.CAL_YR=2015 and */CUSTOMER_DIM_T.NM='A'/* GROUP BY CUSTOMER_DIM_T.NM ORDER BY TOTAL_TXN_AMT DESC LIMIT 10; CUST AMT ==================== A 177070809652.52 Any ideas? Thanks, -- View this message in context: http://apache-kylin.74782.x6.nabble.com/TopN-Results-Differ-in-Hive-and-Kylin-tp3288.html Sent from the Apache Kylin mailing list archive at Nabble.com.
