[jira] [Commented] (KYLIN-3388) Data may become not correct if mappers fail during the cube building step, "distribute by rand()"

2018-06-09 Thread liyang (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507197#comment-16507197
 ] 

liyang commented on KYLIN-3388:
---

Really!! This sounds like a critical bug in hive.

> Data may become not correct if mappers fail during the cube building step, 
> "distribute by rand()"
> -
>
> Key: KYLIN-3388
> URL: https://issues.apache.org/jira/browse/KYLIN-3388
> Project: Kylin
>  Issue Type: Bug
>Reporter: Zhong Yanghong
>Priority: Critical
> Attachments: Hive Issue - distribute by rand().png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3388) Data may become not correct if mappers fail during the cube building step, "distribute by rand()"

2018-05-27 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492271#comment-16492271
 ] 

Zhong Yanghong commented on KYLIN-3388:
---

To deal with this issue, there're two ways:
* Disable the distribute by feature, by setting 
kylin.source.hive.redistribute-flat-table=false
* Use distribute by multiple columns rather than by rand(), which needs a patch.

> Data may become not correct if mappers fail during the cube building step, 
> "distribute by rand()"
> -
>
> Key: KYLIN-3388
> URL: https://issues.apache.org/jira/browse/KYLIN-3388
> Project: Kylin
>  Issue Type: Bug
>Reporter: Zhong Yanghong
>Priority: Critical
> Attachments: Hive Issue - distribute by rand().png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3388) Data may become not correct if mappers fail during the cube building step, "distribute by rand()"

2018-05-27 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492269#comment-16492269
 ] 

Zhong Yanghong commented on KYLIN-3388:
---

!Hive Issue - distribute by rand().png!
As the above figure shown, after the map step, data for reducers has been 
prepared. Suppose R1 starts to run first. It will pull data D1,1 & D2,1 from 
mappers. Then it finishes. Then R2 begins to run. Unluckily, this time M2 is 
unavailable. Then R2 will ask to start another mapper called M'2. After M'2 
prepared data D'2,1 & D'2,2, R2 pulls data D1,2 from M1, and pulls data D'2,2 
from M'2. Finally R2 finishes its job.

Then the input for reducers will become D1,1 & D2,1, D1,2 & D2',2, rather than 
D1,1 & D2,1, D1,2 & D2,2. Since the partitioner for this hive job is not fixed, 
the data D2,2 & D'2,2 are rarely the same. Therefore, the final result will 
become incorrect.

> Data may become not correct if mappers fail during the cube building step, 
> "distribute by rand()"
> -
>
> Key: KYLIN-3388
> URL: https://issues.apache.org/jira/browse/KYLIN-3388
> Project: Kylin
>  Issue Type: Bug
>Reporter: Zhong Yanghong
>Priority: Critical
> Attachments: Hive Issue - distribute by rand().png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)