[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980183#comment-13980183 ]
Colin Patrick McCabe commented on HDFS-5851: -------------------------------------------- I took a quick look at the design doc. I think the focus on "discardable" memory makes sense in light of next-gen frameworks like Spark, Tez, etc. One note: Tachyon, Spark's caching layer, does not currently incorporate the concept of RDDs, although that support is planned, as I understand. It's just caching (serialized) files at this point, and I think the semantics match up pretty well with what we're talking about here. The execution framework can re-generate the data if needed... this re-generating support does not need to be included in HDFS. I think that some HDFS applications will want the ability to treat multiple files as a single eviction unit... i.e., if you evict one file, you evict them all. (Things like Hive tables are multiple files, but probably ought to be treated as a single unit for caching purposes.) There are also some questions about when eviction can occur... it seems like it would be very inconvenient to do it while the file was being read. On the other hand, we probably need a timeout to prevent a selfish process (or a process on a disconnected node) from pinning something in the cache forever by keeping a file open. Clearly we want the ability to do things like skip checksums when reading the cached files. This will reuse a lot of the HDFS-4949 code. It's less clear what other aspects of the HDFS-4949 code we'll want to reuse. I think cache pools might be one such thing. There is a potential to reuse some of the implementation as well, such as mlocking and so forth. An mlocked file in /dev/shm could be a good way to go here. I am free all of next week, except for Friday. Let's schedule a webex so we can figure this stuff out. > Support memory as a storage medium > ---------------------------------- > > Key: HDFS-5851 > URL: https://issues.apache.org/jira/browse/HDFS-5851 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode > Affects Versions: 3.0.0 > Reporter: Arpit Agarwal > Assignee: Arpit Agarwal > Attachments: > SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf > > > Memory can be used as a storage medium for smaller/transient files for fast > write throughput. > More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)