reevaluate limiting the number of open files given HDFS improvements
--------------------------------------------------------------------
Key: ACCUMULO-416
URL: https://issues.apache.org/jira/browse/ACCUMULO-416
Project: Accumulo
Issue Type: Improvement
Components: tserver
Reporter: Adam Fuchs
Assignee: Keith Turner
Tablet servers limit the number of files that can be opened for scans and for
major compactions. The two main reasons for this limit was to reduce our impact
on HDFS, primarily regarding connections to data nodes, and to limit our memory
usage related to preloading file indexes. A third reason might be that disk
thrashing could become a problem if we try to read from too many places at once.
Two improvements may have made (or may soon make) this limit obsolete: HDFS now
pools connections, and RFile now uses a multi-level index. With these
improvements, is it reasonable to lift some of our open file restrictions? The
tradeoff on query side might be availability vs. overall resource usage. On the
compaction side, the tradeoff is probably write replication vs. thrashing on
reads. I think we can make an argument that queries should be available at
almost any cost, but the compaction tradeoff is not as clear. We should test
the efficiency of compacting a large number of files to get a better feeling
for how the two extremes effect read and write performance across the system.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira