One way is to write a small program which does diff at block level. Open both files, read data with same offset do a diff. This will tell you diffs at your offset boundry and usefull to check if two files differ. There is also an open jira which can get you chechsum of files which would make this task trivial. Lohit
On Sep 4, 2008, at 6:51 AM, "Andrey Pankov" <[EMAIL PROTECTED]> wrote: Hello, Does anyone know is it possible to compare data on HDFS but avoid coping data to local box? I mean if I'd like to find difference between local text files I can use diff command. If files are at HDFS then I have to get them from HDFS to local box and only then do diff. Coping files to local fs is a bit annoying and could be problematical when files are huge, say 2-5 Gb. Thanks in advance. -- Andrey Pankov