BELUGA BEHR created HIVE-16758:
----------------------------------
Summary: Better Select Number of Replications
Key: HIVE-16758
URL: https://issues.apache.org/jira/browse/HIVE-16758
Project: Hive
Issue Type: Improvement
Reporter: BELUGA BEHR
Priority: Minor
{{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
We should be smarter about how we pick a replication number. We should add a
new configuration equivalent to {{mapreduce.client.submit.file.replication}}.
This value should be around the square root of the number of nodes and not
hard-coded in the code.
{code}
public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
private int minReplication = 10;
@Override
protected void initializeOp(Configuration hconf) throws HiveException {
...
int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
// minReplication value should not cross the value of dfs.replication.max
minReplication = Math.min(minReplication, dfsMaxReplication);
}
{code}
https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)