[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157145#comment-14157145
 ] 

Colin Patrick McCabe commented on HDFS-6988:
--------------------------------------------

I'm trying to understand the process for configuring this.  First, there is the 
decision as to how big to make the ramdisk.  This is something that a sysadmin 
needs to do ahead of time (or management software needs to do).  This is 
clearly going to be done in terms of a number of bytes.  Then, there is setting 
{{dfs.datanode.ram.disk.low.watermark.percent}}.  This will determine how much 
of the ramdisk we will try to keep free.  Then there is 
{{dfs.datanode.ram.disk.low.watermark.replicas}}.  I'm not sure when you would 
set this one.

I don't like the fact that {{dfs.datanode.ram.disk.low.watermark.percent}} is 
an int.  In a year or two, we may find that 100 GB ramdisks are common.  Then 
the sysadmin gets a choice between specifying 0% (0 bytes free) and 1% (try to 
keep 1 GB free).  Making this a float would be better, I think...

Why is {{dfs.datanode.ram.disk.low.watermark.replicas}} specified in terms of 
number of replicas?  Block size is a per-replica property-- I could easily have 
a client that writes 256 MB or 1 GB replicas, while the DataNode is configured 
with {{dfs.blocksize}} at 64MB.  It's pretty common for formats like ORCFile 
and Apache Parquet to use large blocks and seek around within them.  This 
property seems like it should be given in terms of bytes to avoid confusion.  
It seems like we are translating it into a number of bytes before using it 
anyway, so why not give the user access to that number directly?

bq. I explained this earlier, a single number fails to work well for a range of 
disks and makes configuration mandatory. What would you choose as the default 
value of this single setting. Let's say we choose 1GB or higher. Then we are 
wasting at least 25% of space on a 4GB RAM disk. Or we choose 512MB. Then we 
are not evicting fast enough to keep up with multiple writers on a 50GB disk.

There seems to be a hidden assumption that the number of writers (or the speed 
at which they're writing) will increase with the size of the ramdisk.  I don't 
see why that's true.  In theory, I could have a system with a small ramdisk and 
a high write rate, or a system with a huge ramdisk and a low write rate.  It 
seems that the amount of space I want to keep free is related to a percentage 
of the write rate, not to a percentage of the total ramdisk size?

> Add configurable limit for percentage-based eviction threshold
> --------------------------------------------------------------
>
>                 Key: HDFS-6988
>                 URL: https://issues.apache.org/jira/browse/HDFS-6988
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: HDFS-6581
>            Reporter: Arpit Agarwal
>             Fix For: HDFS-6581
>
>         Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to