[ 
http://issues.apache.org/jira/browse/HADOOP-746?page=comments#action_12453700 ] 
            
Doug Cutting commented on HADOOP-746:
-------------------------------------

> The FS needs CRCs to manage replication and validation and should have a 
> uniform internal mechanism.

I think you mean "HDFS needs...", right?  But HDFS is not the only FS we wish 
to support, and not all of these will have a sufficient, end-to-end CRC system. 
 So a reusable, end-to-end CRC system is useful to Hadoop.  Whether or not that 
suffices for HDFS seems to be what you're answering with a "no", although I'm 
not sure why.  It seems to me that a well-designed reusable, end-to-end CRC 
system could be used by HDFS, so that HDFS doesn't have to re-invent it all.  
The CRC system could, e.g., make CRCs available along with data buffers.  Maybe 
that's more pain than it's worth, and it would in fact be simpler to have two 
CRC systems, one built in to HDFS and a reusable one that's disabled in HDFS 
but used by other FSes.  Is that what you're arguing?

> Nor is that URL very friendly...

I agree that is a problem with this proposal.  It would be better if users see 
hdfs:, s3: and file: urls.


> CRC computation and reading should move into a nested FileSystem
> ----------------------------------------------------------------
>
>                 Key: HADOOP-746
>                 URL: http://issues.apache.org/jira/browse/HADOOP-746
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.8.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>
> Currently FileSystem provides both an interface and a mechanism for computing 
> and checking crc files. I propose splitting the crc code into a nestable 
> FileSystem that like the PhasedFileSystem has a backing FileSystem. Once the 
> Paths are converted to URI, this is fairly natural to express. To use crc 
> files, your uris will look like:
> crc://hdfs:%2f%2fhost1:8020/ which is a crc FileSystem with an underlying 
> file system of hdfs://host1:8020
> This will allow users to use crc files where they make sense for their 
> application/cluster and get rid of the "raw" methods.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to