Hi Matt, On Jun 20, 2011, at 1:46pm, GOEKE, MATTHEW (AG/1000) wrote:
> Has anyone else run into issues using output compression (in our case lzo) on > TestDFSIO and it failing to be able to read the metrics file? I just assumed > that it would use the correct decompression codec after it finishes but it > always returns with a 'File not found' exception. Yes, I've run into the same issue on 0.20.2 and CHD3u0 I don't see any Jira issue that covers this problem, so unless I hear otherwise I'll file one. The problem is that the post-job code doesn't handle getting the <path>.deflate or <path>.lzo (for you) file from HDFS, and then decompressing it. > Is there a simple way around this without spending the time to recompile a > cluster/codec specific version? You can use "hadoop fs -text <path reported in exception>.lzo" This will dump out the file, which looks like: f:rate 171455.11 f:sqrate 2981174.8 l:size 10485760000 l:tasks 10 l:time 590537 If you take f:rate/1000/l:tasks, that should give you the average MB/sec. E.g. for the example above, that would be 171455/1000/10 = 17MB/sec. -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr