[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117568#comment-16117568 ]
Chao Sun commented on HIVE-16758: --------------------------------- +1 on patch v3. > Better Select Number of Replications > ------------------------------------ > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Minor > Attachments: HIVE-16758.1.patch, HIVE-16758.2.patch, > HIVE-16758.3.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.4.14#64029)