Re: Iostat on Hadoop

2011-03-16 Thread Jérôme Thièvre INA
Hi Matthew,

you can use iostat -xm 2 to monitor disk usage.
Look at %util column. When numbers are between 90-100% for some devices, you
start to have some processes that are in disk sleep status and you may have
excessive loads.
Use htop to monitor disk sleep processes. Sort on the S column and watch for
the D status.

Jérôme Thièvre

2011/3/16 Matthew John tmatthewjohn1...@gmail.com

 Hi all,

 Can someone give pointers on using Iostat to account for IO overheads
 (disk read/writes) in a MapReduce job.

 Matthew John



Re: How to manage large record in MapReduce

2011-01-07 Thread Jérôme Thièvre INA
Hi Sonal,

thank you, I have just implemented a solution similar to yours (without
copying to a temp file as suggested in my inital post), and it seems to
work.
Best Regards,

Jérôme

2011/1/7 Sonal Goyal sonalgoy...@gmail.com

 Jerome,

 You can take a look at FileStreamInputFormat at

 https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/input

 This provides an input stream per file. In our case, we are using the input
 stream to load data into the database directly. Maybe you can use this or a
 similar approach for working with your videos.

 HTH

 Thanks and Regards,
 Sonal
 https://github.com/sonalgoyal/hihoConnect Hadoop with databases,
 Salesforce, FTP servers and others https://github.com/sonalgoyal/hiho
 Nube Technologies http://www.nubetech.co

 http://in.linkedin.com/in/sonalgoyal





 On Thu, Jan 6, 2011 at 4:23 PM, Jérôme Thièvre jthie...@gmail.com wrote:

  Hi,
 
  we are currently using Hadoop (version 0.20.2) to manage some web
 archiving
  processes like fulltext indexing, and it works very well with small
 records
  that contains html.
  Now, we would like to work with other type of web data like videos. These
  kind of data could be really large and of course these records doesn't
 fit
  in memory.
 
  Is it possible to manage record which content doesn't reside in memory
 but
  on disk.
  A possibility would be to implements a Writable that read its content
 from
  a
  DataInput but doesn't load it in memory, instead it would copy that
 content
  to a temporary file in the local file system and allows to stream its
  content using an InputStream (an InputStreamWritable).
 
  Has somebody tested a similar approach, and if not do you think some big
  problems could happen (that impacts performance) with this method ?
 
  Thanks,
 
  Jérôme Thièvre