[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Open (was: Patch Available) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Patch Available (was: Open) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Patch Available (was: Open) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Open (was: Patch Available) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Patch Available (was: Open) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Open (was: Patch Available) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Status: Patch Available (was: Open) > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-17124: --- Attachment: HIVE-17124.1.patch > PlanUtils: Rand() is not a failure-tolerant distribution column > --- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.0, 3.0.0 >Reporter: Gopal V > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)