[ 
https://issues.apache.org/jira/browse/HIVE-22181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qingfa zhou updated HIVE-22181:
-------------------------------
    Attachment: image-2019-09-10-12-40-53-338.png

> Same query but different result occasionally
> --------------------------------------------
>
>                 Key: HIVE-22181
>                 URL: https://issues.apache.org/jira/browse/HIVE-22181
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.3.0
>         Environment: hadoop:2.7.3.2.6.1.0-129
> hive:2.3.0
> tez:0.8.4
> java:1.8.0_212 
>            Reporter: qingfa zhou
>            Priority: Major
>         Attachments: WX20190910-123604.png, WX20190910-123621.png, 
> WX20190910-123641.png, image-2019-09-10-12-40-34-958.png, 
> image-2019-09-10-12-40-53-338.png
>
>
> h3. 1.Phenomenon
>     Running the same SQL(hive on tez) several times will result in 
> inconsistent results . The following results are generated respectively
>     1)
>      OK
>      3951864 808
>     2)
>      OK
>      1822979 353
>     But the first result is true.
> h3. 2.SQL
> select
>  count(1),count(distinct store_code)
>  from 
>  (
>  select
>  store_code,
>  all_pay_id,
>  payable_price,
>  row_number()over(partition by store_code order by payable_price desc) as rw,
>  count(1)over(partition by store_code) as store_user
>  from 
>  (
>  select 
>  store_code,
>  all_pay_id,
>  sum(payable_price) as payable_price
>  from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
>  where dt='20190904'
>  group by store_code,all_pay_id
>  ) a 
>  order by rw 
>  ) a 
>  where rw/store_user<=0.8
>  ;
>  
> h3. 3.Troubleshooting
>     I lookup task result from tez web ui,give the result as follows:
>            the first: 
>         the second:
> !image-2019-09-10-12-18-08-769.png!
>       so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much 
> lower than the first.
>      This is detailed info of the second result from reducer_3 to reducer_4 
> !image-2019-09-10-12-26-41-681.png!  
> This is execution plan of the sql:
> OK
>  Vertex dependency in root stage
>  Reducer 2 <- Map 1 (SIMPLE_EDGE)
>  Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>  Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
>  Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>  Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
> Stage-0
>  Fetch Operator
>  limit:-1
>  Stage-1
>  Reducer 6
>  File Output Operator [FS_24]
>  Group By Operator [GBY_22] (rows=1 width=128)
>  Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
> KEY._col0:0._col0)"]
>  <-Reducer 5 [SIMPLE_EDGE]
>  SHUFFLE [RS_21]
>  Group By Operator [GBY_20] (rows=5619870 width=2328)
>  Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT 
> _col0)"],keys:_col0
>  Select Operator [SEL_17] (rows=5619870 width=2328)
>  Output:["_col0"]
>  <-Reducer 4 [SIMPLE_EDGE]
>  SHUFFLE [RS_16]
>  Select Operator [SEL_14] (rows=5619870 width=2328)
>  Output:["_col0","_col3"]
>  Filter Operator [FIL_25] (rows=5619870 width=2328)
>  predicate:((_col0 / count_window_1) <= 0.8)
>  PTF Operator [PTF_13] (rows=16859610 width=2328)
>  Function definitions:[{},\\{"name:":"windowingtablefunction","order 
> by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}]
>  Select Operator [SEL_12] (rows=16859610 width=2328)
>  Output:["_col0","_col1"]
>  <-Reducer 3 [SIMPLE_EDGE]
>  SHUFFLE [RS_11]
>  PartitionCols:_col0
>  Select Operator [SEL_10] (rows=16859610 width=2328)
>  Output:["_col0","row_number_window_0"]
>  PTF Operator [PTF_9] (rows=16859610 width=2328)
>  Function definitions:[{},\\{"name:":"windowingtablefunction","order 
> by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}]
>  Select Operator [SEL_8] (rows=16859610 width=2328)
>  Output:["_col0","_col2"]
>  <-Reducer 2 [SIMPLE_EDGE]
>  SHUFFLE [RS_7]
>  PartitionCols:_col0
>  Select Operator [SEL_6] (rows=16859610 width=2328)
>  Output:["_col0","_col2"]
>  Group By Operator [GBY_5] (rows=16859610 width=2328)
>  
> Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
>  KEY._col1
>  <-Map 1 [SIMPLE_EDGE]
>  SHUFFLE [RS_4]
>  PartitionCols:_col0, _col1
>  Group By Operator [GBY_3] (rows=33719220 width=2328)
>  
> Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code,
>  all_pay_id
>  Select Operator [SEL_2] (rows=33719220 width=2328)
>  Output:["store_code","all_pay_id","payable_price"]
>  TableScan [TS_0] (rows=33719220 width=2328)
>  
> data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]
>  
>     Reducer 4 show PTF , so I don't know if this result has anything to do 
> with PTF? Can you help me.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to