One way is to write a small program which does diff at block level. Open both 
files, read data with same offset do a diff. This will tell you diffs at your 
offset boundry and usefull to check if two files differ. There is also an open 
jira which can get you chechsum of files which would make this task trivial.
Lohit

On Sep 4, 2008, at 6:51 AM, "Andrey Pankov" <[EMAIL PROTECTED]> wrote:

Hello,

Does anyone know is it possible to compare data on HDFS but avoid
coping data to local box? I mean if I'd like to find difference
between local text files I can use diff command. If files are at HDFS
then I have to get them from HDFS to local box and only then do diff.
Coping files to local fs is a bit annoying and could be problematical
when files are huge, say 2-5 Gb.

Thanks in advance.

-- 
Andrey Pankov

Reply via email to