[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2018-01-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Open  (was: Patch Available)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2018-01-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Patch Available  (was: Open)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2018-01-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Patch Available  (was: Open)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2018-01-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Open  (was: Patch Available)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2017-07-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Patch Available  (was: Open)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2017-07-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Open  (was: Patch Available)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2017-07-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Status: Patch Available  (was: Open)

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

2017-07-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17124:
---
Attachment: HIVE-17124.1.patch

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---
>
> Key: HIVE-17124
> URL: https://issues.apache.org/jira/browse/HIVE-17124
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Gopal V
> Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>   // numPartitionFields = -1 means random partitioning
>   
> partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
> }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, 
> which kicks in automatically when using no partition columns
> {code}
> if (partitionEval.length == 0) {
>   // If no partition cols, just distribute the data uniformly
>   // to provide better load balance. If the requirement is to have a 
> single reducer, we should
>   // set the number of reducers to 1. Use a constant seed to make the 
> code deterministic.
>   if (random == null) {
> random = new Random(12345);
>   }
>   keyHashCode = random.nextInt();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)