[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980183#comment-13980183
 ] 

Colin Patrick McCabe commented on HDFS-5851:
--------------------------------------------

I took a quick look at the design doc.  I think the focus on "discardable" 
memory makes sense in light of next-gen frameworks like Spark, Tez, etc.  One 
note: Tachyon, Spark's caching layer, does not currently incorporate the 
concept of RDDs, although that support is planned, as I understand.  It's just 
caching (serialized) files at this point, and I think the semantics match up 
pretty well with what we're talking about here.  The execution framework can 
re-generate the data if needed... this re-generating support does not need to 
be included in HDFS.

I think that some HDFS applications will want the ability to treat multiple 
files as a single eviction unit... i.e., if you evict one file, you evict them 
all.  (Things like Hive tables are multiple files, but probably ought to be 
treated as a single unit for caching purposes.)  There are also some questions 
about when eviction can occur... it seems like it would be very inconvenient to 
do it while the file was being read.  On the other hand, we probably need a 
timeout to prevent a selfish process (or a process on a disconnected node) from 
pinning something in the cache forever by keeping a file open.

Clearly we want the ability to do things like skip checksums when reading the 
cached files.  This will reuse a lot of the HDFS-4949 code.  It's less clear 
what other aspects of the HDFS-4949 code we'll want to reuse.  I think cache 
pools might be one such thing.  There is a potential to reuse some of the 
implementation as well, such as mlocking and so forth.  An mlocked file in 
/dev/shm could be a good way to go here.

I am free all of next week, except for Friday.  Let's schedule a webex so we 
can figure this stuff out.

> Support memory as a storage medium
> ----------------------------------
>
>                 Key: HDFS-5851
>                 URL: https://issues.apache.org/jira/browse/HDFS-5851
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: 
> SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf
>
>
> Memory can be used as a storage medium for smaller/transient files for fast 
> write throughput.
> More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to