JESSE CHEN created SPARK-13864: ---------------------------------- Summary: TPCDS query 74 returns wrong results compared to TPC official result set Key: SPARK-13864 URL: https://issues.apache.org/jira/browse/SPARK-13864 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: JESSE CHEN
Testing Spark SQL using TPC queries. Query 66 returns wrong results compared to official result set. This is at 1GB SF (validation run). Aggregations slightly off -- eg. JAN_SALES column of "Doors canno" row - SparkSQL returns 6355232.185385704, expected 6355232.31 Actual results: [null,null,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,9597806.850651741,1.1121820530080795E7,8670867.81564045,8994785.945689201,1.088724806326294E7,1.4187671518377304E7,9732598.460139751,1.9798897020946026E7,2.1007842467959404E7,2.149551364927292E7,3.479566905774999E7,3.3122997954660416E7,null,null,null,null,null,null,null,null,null,null,null,null,2.1913594697555542E7,3.2518476414670944E7,2.48856624883976E7,2.5698343830046654E7,3.373591080598068E7,3.552703167087555E7,2.5465193481492043E7,5.362323870799959E7,5.1409986978201866E7,5.415917383586836E7,9.222704311805725E7,8.343539111531019E7] [Bad cards must make.,621234,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,9506753.593884468,8008140.429557085,6116769.711647987,1.1973045160133362E7,7756254.925520897,5352978.574095726,1.373399613500309E7,1.6418794411203384E7,1.7212743279764652E7,1.704270732417488E7,3.43049358570323E7,3.532416421229005E7,15.30301560102066,12.890698882477594,9.846160563729589,19.273003667109915,12.485238936569628,8.61668642427125,22.107605403121994,26.429323590150222,27.707342611261865,27.433635834765774,55.22063482847413,56.86128610521969,3.0534943928382874E7,2.4481686250203133E7,2.217871080008793E7,2.569579825610423E7,2.995490355044937E7,1.8084140250833035E7,3.0805576178061485E7,4.7156887432252884E7,5.115858869637826E7,5.5759943171424866E7,8.625354428184557E7,8.345155532035494E7] [Conventional childr,977787,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,8860645.460736752,1.441581376543355E7,6761497.232810497,1.1820654735879421E7,8246260.600341797,6636877.482845306,1.1434492123092413E7,2.5673812070380323E7,2.3074206999911785E7,2.1834582007320404E7,2.6894900596512794E7,3.357509177109933E7,9.061938296108202,14.743306840276613,6.9151024024767125,12.08919195681618,8.43359606984118,6.787651587559771,11.694256645969329,26.257060147435304,23.598398219562938,22.330611889215547,27.505888906799534,34.337838170377935,2.3836085704864502E7,3.20733132298584E7,2.503790437837982E7,2.2659895963564873E7,2.175740087420273E7,2.4451608012176514E7,2.1933001734852314E7,5.59967034604629E7,5.737188052299309E7,6.208721474336243E7,8.284991027382469E7,8.897031933202875E7] [Doors canno,294242,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,6355232.185385704,1.0198920296742141E7,1.0246200903741479E7,1.2209716492156029E7,8566998.262890816,8806316.75278151,9789405.6993227,1.646658496404171E7,2.6443785668474197E7,2.701604788320923E7,3.366058958298761E7,2.7462468750599384E7,21.59865751791282,34.66167405313361,34.822360178837414,41.495491779406166,29.115484067165177,29.928823053070296,33.26991285854059,55.96272783641258,89.87087386734116,91.81574310672585,114.39763726112386,93.33293258813964,2.2645142994330406E7,2.448725452685547E7,2.4925759290207863E7,3.0503655031727314E7,2.6558160276379585E7,2.0976233452690125E7,2.9895796101181984E7,5.600219855566597E7,5.348815865275085E7,7.628723580410767E7,8.248374754962921E7,8.808826726185608E7] [Important issues liv,138504,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,1.1748784594717264E7,1.435130566355586E7,9896470.867572784,7990874.805492401,8879247.840401173,7362383.04259038,1.0011144724414349E7,1.7741201390372872E7,2.1346976135887742E7,1.8074978020030975E7,2.967512567988676E7,3.2545325348875403E7,84.8263197793368,103.6165429414014,71.45259969078715,57.694180713137534,64.10824120892663,53.156465102743454,72.28054586448297,128.09161750110374,154.12534032149065,130.5014874662896,214.25464737398747,234.97751219369408,2.7204167203903973E7,2.598037822457385E7,1.9943398915802002E7,2.5710421112384796E7,1.948448105346489E7,2.6346611484448195E7,2.5075158296625137E7,5.409477817043829E7,4.106673223178029E7,5.454705814340496E7,7.246596285337901E7,9.277032812079096E7] Expected results: +----------------------+-------------------+----------+-------------------+---------+---------------+---------------+------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+ | W_WAREHOUSE_NAME | W_WAREHOUSE_SQ_FT | W_CITY | W_COUNTY | W_STATE | W_COUNTRY | SHIP_CARRIERS | YEAR | JAN_SALES | FEB_SALES | MAR_SALES | APR_SALES | MAY_SALES | JUN_SALES | JUL_SALES | AUG_SALES | SEP_SALES | OCT_SALES | NOV_SALES | DEC_SALES | JAN_SALES_PER_SQ_FOOT | FEB_SALES_PER_SQ_FOOT | MAR_SALES_PER_SQ_FOOT | APR_SALES_PER_SQ_FOOT | MAY_SALES_PER_SQ_FOOT | JUN_SALES_PER_SQ_FOOT | JUL_SALES_PER_SQ_FOOT | AUG_SALES_PER_SQ_FOOT | SEP_SALES_PER_SQ_FOOT | OCT_SALES_PER_SQ_FOOT | NOV_SALES_PER_SQ_FOOT | DEC_SALES_PER_SQ_FOOT | JAN_NET | FEB_NET | MAR_NET | APR_NET | MAY_NET | JUN_NET | JUL_NET | AUG_NET | SEP_NET | OCT_NET | NOV_NET | DEC_NET | +----------------------+-------------------+----------+-------------------+---------+---------------+---------------+------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+ | Bad cards must make. | 621234 | Fairview | Williamson County | TN | United States | DHL,BARIAN | 2001 | 9506753.46000 | 8008140.33000 | 6116769.63000 | 11973045.15000 | 7756254.92000 | 5352978.49000 | 13733996.10000 | 16418794.37000 | 17212743.32000 | 17042707.41000 | 34304935.61000 | 35324164.21000 | 15.30302 | 12.89070 | 9.84616 | 19.27300 | 12.48524 | 8.61669 | 22.10761 | 26.42932 | 27.70734 | 27.43364 | 55.22063 | 56.86129 | 30534943.77000 | 24481685.94000 | 22178710.81000 | 25695798.18000 | 29954903.78000 | 18084140.05000 | 30805576.13000 | 47156887.22000 | 51158588.86000 | 55759942.80000 | 86253544.16000 | 83451555.63000 | | Conventional childr | 977787 | Fairview | Williamson County | TN | United States | DHL,BARIAN | 2001 | 8860645.55000 | 14415813.74000 | 6761497.23000 | 11820654.76000 | 8246260.69000 | 6636877.49000 | 11434492.25000 | 25673812.14000 | 23074206.96000 | 21834581.94000 | 26894900.53000 | 33575091.74000 | 9.06194 | 14.74331 | 6.91510 | 12.08919 | 8.43360 | 6.78765 | 11.69426 | 26.25706 | 23.59840 | 22.33061 | 27.50589 | 34.33784 | 23836085.83000 | 32073313.37000 | 25037904.18000 | 22659895.86000 | 21757401.03000 | 24451608.10000 | 21933001.85000 | 55996703.43000 | 57371880.44000 | 62087214.51000 | 82849910.15000 | 88970319.31000 | | Doors canno | 294242 | Fairview | Williamson County | TN | United States | DHL,BARIAN | 2001 | 6355232.31000 | 10198920.36000 | 10246200.97000 | 12209716.50000 | 8566998.28000 | 8806316.81000 | 9789405.60000 | 16466584.88000 | 26443785.61000 | 27016047.80000 | 33660589.67000 | 27462468.62000 | 21.59866 | 34.66167 | 34.82236 | 41.49549 | 29.11548 | 29.92882 | 33.26991 | 55.96273 | 89.87087 | 91.81574 | 114.39764 | 93.33293 | 22645143.09000 | 24487254.60000 | 24925759.42000 | 30503655.27000 | 26558160.29000 | 20976233.52000 | 29895796.09000 | 56002198.38000 | 53488158.53000 | 76287235.46000 | 82483747.59000 | 88088266.69000 | | Important issues liv | 138504 | Fairview | Williamson County | TN | United States | DHL,BARIAN | 2001 | 11748784.55000 | 14351305.77000 | 9896470.93000 | 7990874.78000 | 8879247.90000 | 7362383.09000 | 10011144.75000 | 17741201.32000 | 21346976.05000 | 18074978.16000 | 29675125.64000 | 32545325.29000 | 84.82632 | 103.61654 | 71.45260 | 57.69418 | 64.10824 | 53.15647 | 72.28055 | 128.09162 | 154.12534 | 130.50149 | 214.25465 | 234.97751 | 27204167.15000 | 25980378.13000 | 19943398.93000 | 25710421.13000 | 19484481.03000 | 26346611.48000 | 25075158.43000 | 54094778.13000 | 41066732.11000 | 54547058.28000 | 72465962.92000 | 92770328.27000 | | [NULL] | [NULL] | Fairview | Williamson County | TN | United States | DHL,BARIAN | 2001 | 9597806.95000 | 11121820.57000 | 8670867.91000 | 8994786.04000 | 10887248.09000 | 14187671.36000 | 9732598.41000 | 19798897.07000 | 21007842.34000 | 21495513.67000 | 34795669.17000 | 33122997.94000 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 21913594.59000 | 32518476.51000 | 24885662.72000 | 25698343.86000 | 33735910.61000 | 35527031.58000 | 25465193.48000 | 53623238.66000 | 51409986.76000 | 54159173.90000 | 92227043.25000 | 83435390.84000 | +----------------------+-------------------+----------+-------------------+---------+---------------+---------------+------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+ Query used: -- start query 66 in stream 0 using template query66.tpl and seed QUALIFICATION select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,ship_carriers ,year ,sum(jan_sales) as jan_sales ,sum(feb_sales) as feb_sales ,sum(mar_sales) as mar_sales ,sum(apr_sales) as apr_sales ,sum(may_sales) as may_sales ,sum(jun_sales) as jun_sales ,sum(jul_sales) as jul_sales ,sum(aug_sales) as aug_sales ,sum(sep_sales) as sep_sales ,sum(oct_sales) as oct_sales ,sum(nov_sales) as nov_sales ,sum(dec_sales) as dec_sales ,sum(jan_sales/w_warehouse_sq_ft) as jan_sales_per_sq_foot ,sum(feb_sales/w_warehouse_sq_ft) as feb_sales_per_sq_foot ,sum(mar_sales/w_warehouse_sq_ft) as mar_sales_per_sq_foot ,sum(apr_sales/w_warehouse_sq_ft) as apr_sales_per_sq_foot ,sum(may_sales/w_warehouse_sq_ft) as may_sales_per_sq_foot ,sum(jun_sales/w_warehouse_sq_ft) as jun_sales_per_sq_foot ,sum(jul_sales/w_warehouse_sq_ft) as jul_sales_per_sq_foot ,sum(aug_sales/w_warehouse_sq_ft) as aug_sales_per_sq_foot ,sum(sep_sales/w_warehouse_sq_ft) as sep_sales_per_sq_foot ,sum(oct_sales/w_warehouse_sq_ft) as oct_sales_per_sq_foot ,sum(nov_sales/w_warehouse_sq_ft) as nov_sales_per_sq_foot ,sum(dec_sales/w_warehouse_sq_ft) as dec_sales_per_sq_foot ,sum(jan_net) as jan_net ,sum(feb_net) as feb_net ,sum(mar_net) as mar_net ,sum(apr_net) as apr_net ,sum(may_net) as may_net ,sum(jun_net) as jun_net ,sum(jul_net) as jul_net ,sum(aug_net) as aug_net ,sum(sep_net) as sep_net ,sum(oct_net) as oct_net ,sum(nov_net) as nov_net ,sum(dec_net) as dec_net from ( select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,concat('DHL', ',', 'BARIAN') as ship_carriers ,d_year as year ,sum(case when d_moy = 1 then ws_ext_sales_price* ws_quantity else 0 end) as jan_sales ,sum(case when d_moy = 2 then ws_ext_sales_price* ws_quantity else 0 end) as feb_sales ,sum(case when d_moy = 3 then ws_ext_sales_price* ws_quantity else 0 end) as mar_sales ,sum(case when d_moy = 4 then ws_ext_sales_price* ws_quantity else 0 end) as apr_sales ,sum(case when d_moy = 5 then ws_ext_sales_price* ws_quantity else 0 end) as may_sales ,sum(case when d_moy = 6 then ws_ext_sales_price* ws_quantity else 0 end) as jun_sales ,sum(case when d_moy = 7 then ws_ext_sales_price* ws_quantity else 0 end) as jul_sales ,sum(case when d_moy = 8 then ws_ext_sales_price* ws_quantity else 0 end) as aug_sales ,sum(case when d_moy = 9 then ws_ext_sales_price* ws_quantity else 0 end) as sep_sales ,sum(case when d_moy = 10 then ws_ext_sales_price* ws_quantity else 0 end) as oct_sales ,sum(case when d_moy = 11 then ws_ext_sales_price* ws_quantity else 0 end) as nov_sales ,sum(case when d_moy = 12 then ws_ext_sales_price* ws_quantity else 0 end) as dec_sales ,sum(case when d_moy = 1 then ws_net_paid * ws_quantity else 0 end) as jan_net ,sum(case when d_moy = 2 then ws_net_paid * ws_quantity else 0 end) as feb_net ,sum(case when d_moy = 3 then ws_net_paid * ws_quantity else 0 end) as mar_net ,sum(case when d_moy = 4 then ws_net_paid * ws_quantity else 0 end) as apr_net ,sum(case when d_moy = 5 then ws_net_paid * ws_quantity else 0 end) as may_net ,sum(case when d_moy = 6 then ws_net_paid * ws_quantity else 0 end) as jun_net ,sum(case when d_moy = 7 then ws_net_paid * ws_quantity else 0 end) as jul_net ,sum(case when d_moy = 8 then ws_net_paid * ws_quantity else 0 end) as aug_net ,sum(case when d_moy = 9 then ws_net_paid * ws_quantity else 0 end) as sep_net ,sum(case when d_moy = 10 then ws_net_paid * ws_quantity else 0 end) as oct_net ,sum(case when d_moy = 11 then ws_net_paid * ws_quantity else 0 end) as nov_net ,sum(case when d_moy = 12 then ws_net_paid * ws_quantity else 0 end) as dec_net from web_sales ,warehouse ,date_dim ,time_dim ,ship_mode where ws_warehouse_sk = w_warehouse_sk and ws_sold_date_sk = d_date_sk and ws_sold_time_sk = t_time_sk and ws_ship_mode_sk = sm_ship_mode_sk and d_year = 2001 and t_time between 30838 and 30838+28800 and sm_carrier in ('DHL','BARIAN') group by w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,d_year union all select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,concat('DHL', ',', 'BARIAN') as ship_carriers ,d_year as year ,sum(case when d_moy = 1 then cs_sales_price* cs_quantity else 0 end) as jan_sales ,sum(case when d_moy = 2 then cs_sales_price* cs_quantity else 0 end) as feb_sales ,sum(case when d_moy = 3 then cs_sales_price* cs_quantity else 0 end) as mar_sales ,sum(case when d_moy = 4 then cs_sales_price* cs_quantity else 0 end) as apr_sales ,sum(case when d_moy = 5 then cs_sales_price* cs_quantity else 0 end) as may_sales ,sum(case when d_moy = 6 then cs_sales_price* cs_quantity else 0 end) as jun_sales ,sum(case when d_moy = 7 then cs_sales_price* cs_quantity else 0 end) as jul_sales ,sum(case when d_moy = 8 then cs_sales_price* cs_quantity else 0 end) as aug_sales ,sum(case when d_moy = 9 then cs_sales_price* cs_quantity else 0 end) as sep_sales ,sum(case when d_moy = 10 then cs_sales_price* cs_quantity else 0 end) as oct_sales ,sum(case when d_moy = 11 then cs_sales_price* cs_quantity else 0 end) as nov_sales ,sum(case when d_moy = 12 then cs_sales_price* cs_quantity else 0 end) as dec_sales ,sum(case when d_moy = 1 then cs_net_paid_inc_tax * cs_quantity else 0 end) as jan_net ,sum(case when d_moy = 2 then cs_net_paid_inc_tax * cs_quantity else 0 end) as feb_net ,sum(case when d_moy = 3 then cs_net_paid_inc_tax * cs_quantity else 0 end) as mar_net ,sum(case when d_moy = 4 then cs_net_paid_inc_tax * cs_quantity else 0 end) as apr_net ,sum(case when d_moy = 5 then cs_net_paid_inc_tax * cs_quantity else 0 end) as may_net ,sum(case when d_moy = 6 then cs_net_paid_inc_tax * cs_quantity else 0 end) as jun_net ,sum(case when d_moy = 7 then cs_net_paid_inc_tax * cs_quantity else 0 end) as jul_net ,sum(case when d_moy = 8 then cs_net_paid_inc_tax * cs_quantity else 0 end) as aug_net ,sum(case when d_moy = 9 then cs_net_paid_inc_tax * cs_quantity else 0 end) as sep_net ,sum(case when d_moy = 10 then cs_net_paid_inc_tax * cs_quantity else 0 end) as oct_net ,sum(case when d_moy = 11 then cs_net_paid_inc_tax * cs_quantity else 0 end) as nov_net ,sum(case when d_moy = 12 then cs_net_paid_inc_tax * cs_quantity else 0 end) as dec_net from catalog_sales ,warehouse ,date_dim ,time_dim ,ship_mode where cs_warehouse_sk = w_warehouse_sk and cs_sold_date_sk = d_date_sk and cs_sold_time_sk = t_time_sk and cs_ship_mode_sk = sm_ship_mode_sk and d_year = 2001 and t_time between 30838 and 30838+28800 and sm_carrier in ('DHL','BARIAN') group by w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,d_year ) x group by w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,ship_carriers ,year order by w_warehouse_name limit 100; -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org