[ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157145#comment-14157145 ]
Colin Patrick McCabe commented on HDFS-6988: -------------------------------------------- I'm trying to understand the process for configuring this. First, there is the decision as to how big to make the ramdisk. This is something that a sysadmin needs to do ahead of time (or management software needs to do). This is clearly going to be done in terms of a number of bytes. Then, there is setting {{dfs.datanode.ram.disk.low.watermark.percent}}. This will determine how much of the ramdisk we will try to keep free. Then there is {{dfs.datanode.ram.disk.low.watermark.replicas}}. I'm not sure when you would set this one. I don't like the fact that {{dfs.datanode.ram.disk.low.watermark.percent}} is an int. In a year or two, we may find that 100 GB ramdisks are common. Then the sysadmin gets a choice between specifying 0% (0 bytes free) and 1% (try to keep 1 GB free). Making this a float would be better, I think... Why is {{dfs.datanode.ram.disk.low.watermark.replicas}} specified in terms of number of replicas? Block size is a per-replica property-- I could easily have a client that writes 256 MB or 1 GB replicas, while the DataNode is configured with {{dfs.blocksize}} at 64MB. It's pretty common for formats like ORCFile and Apache Parquet to use large blocks and seek around within them. This property seems like it should be given in terms of bytes to avoid confusion. It seems like we are translating it into a number of bytes before using it anyway, so why not give the user access to that number directly? bq. I explained this earlier, a single number fails to work well for a range of disks and makes configuration mandatory. What would you choose as the default value of this single setting. Let's say we choose 1GB or higher. Then we are wasting at least 25% of space on a 4GB RAM disk. Or we choose 512MB. Then we are not evicting fast enough to keep up with multiple writers on a 50GB disk. There seems to be a hidden assumption that the number of writers (or the speed at which they're writing) will increase with the size of the ramdisk. I don't see why that's true. In theory, I could have a system with a small ramdisk and a high write rate, or a system with a huge ramdisk and a low write rate. It seems that the amount of space I want to keep free is related to a percentage of the write rate, not to a percentage of the total ramdisk size? > Add configurable limit for percentage-based eviction threshold > -------------------------------------------------------------- > > Key: HDFS-6988 > URL: https://issues.apache.org/jira/browse/HDFS-6988 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode > Affects Versions: HDFS-6581 > Reporter: Arpit Agarwal > Fix For: HDFS-6581 > > Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch > > > Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction > thresholds configurable. The hard-coded thresholds may not be appropriate for > very large RAM disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)