[ 
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868473#comment-13868473
 ] 

Arpit Agarwal commented on HDFS-5751:
-------------------------------------

Hi David,

{quote}
The FsDatasetSpi interface is valuable in that it allows people to experiment 
with or deliver solutions on alternate storage systems without needing to make 
and maintain changes to the rest of the Datanode.
{quote}
I think the reverse is true. It increases the implementation burden and almost 
necessitates code duplication between the official {{FsDatasetImpl}} and 
alternate implementations. i.e. any implementation must know about HDFS 
internal concepts like block states, maintain a block map and all the 
complexity that comes with it like reconciling block state differences between 
memory and disk, be involved with replica recovery and probably more.

This is like rewriting parts of ext3fs to support new a storage medium when all 
that should be required is a new device driver. I think we can make it easier 
for implementers by moving the abstraction a few levels lower. What do you 
think?

I will also read through HDFS-5194 and your design doc.

> Remove the FsDatasetSpi and FsVolumeSpi interfaces
> --------------------------------------------------
>
>                 Key: HDFS-5751
>                 URL: https://issues.apache.org/jira/browse/HDFS-5751
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, test
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
>
> The in-memory block map and disk interface portions of the DataNode have been 
> abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
> {{FsVolumeSpi}} to represent individual volumes.
> The abstraction is useful as it allows DataNode tests to use a 
> {{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
> stores block metadata in memory and returns zeroes for all reads. This is 
> useful for both unit testing and for simulating arbitrarily large datanodes 
> without having to provision real disk capacity.
> A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
> {{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
> However there are a few problems with this approach:
> # Using the factory class significantly complicates the code flow for the 
> common case. This makes the code harder to understand and debug.
> # There is additional burden of maintaining two different dataset 
> implementations.
> # Fidelity between the two implementations is poor.
> Instead we can eliminate the SPIs and just hide the disk read/write routines 
> with a dependency injection framework like Google Guice.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to