TestDFSIO failure

2011-06-20 Thread GOEKE, MATTHEW (AG/1000)
Has anyone else run into issues using output compression (in our case lzo) on 
TestDFSIO and it failing to be able to read the metrics file? I just assumed 
that it would use the correct decompression codec after it finishes but it 
always returns with a 'File not found' exception. Is there a simple way around 
this without spending the time to recompile a cluster/codec specific version?

Matt
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.


Re: TestDFSIO failure

2011-09-01 Thread Ken Krugler
Hi Matt,

On Jun 20, 2011, at 1:46pm, GOEKE, MATTHEW (AG/1000) wrote:

> Has anyone else run into issues using output compression (in our case lzo) on 
> TestDFSIO and it failing to be able to read the metrics file? I just assumed 
> that it would use the correct decompression codec after it finishes but it 
> always returns with a 'File not found' exception.

Yes, I've run into the same issue on 0.20.2 and CHD3u0

I don't see any Jira issue that covers this problem, so unless I hear otherwise 
I'll file one.

The problem is that the post-job code doesn't handle getting the .deflate 
or .lzo (for you) file from HDFS, and then decompressing it.

> Is there a simple way around this without spending the time to recompile a 
> cluster/codec specific version?


You can use "hadoop fs -text .lzo"

This will dump out the file, which looks like:

f:rate  171455.11
f:sqrate2981174.8
l:size  1048576
l:tasks 10
l:time  590537

If you take f:rate/1000/l:tasks, that should give you the average MB/sec.

E.g. for the example above, that would be 171455/1000/10 = 17MB/sec.

-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr