Re: How do I diagnose IO bounded errors using the framework counters?

2011-10-05 Thread John Meagher
The counter names are created dynamically in mapred.Task

  /**
   * Counters to measure the usage of the different file systems.
   * Always return the String array with two elements. First one is
the name of
   * BYTES_READ counter and second one is of the BYTES_WRITTEN counter.
   */
  protected static String[] getFileSystemCounterNames(String uriScheme) {
String scheme = uriScheme.toUpperCase();
return new String[]{scheme+"_BYTES_READ", scheme+"_BYTES_WRITTEN"};
  }


On Tue, Oct 4, 2011 at 17:22, W.P. McNeill  wrote:
> Here's an even more basic question. I tried to figure out what
> the FILE_BYTES_READ means by searching every file in the hadoop 0.20.203.0
> installation for the string FILE_BYTES_READ installation by running
>
>      find . -type f | xargs grep FILE_BYTES_READ
>
> I only found this string in source files in vaidya contributor directory and
> the tools/rumen directories. Nothing in the main source base.
>
> Where in the source code are these counters created and updated?
>


Fixing Mis-replicated blocks

2011-10-20 Thread John Meagher
After a hardware move with an unfortunate mis-setup rack awareness
script our hadoop cluster has a large number of mis-replicated blocks.
 After about a week things haven't gotten better on their own.

Is there a good way to trigger the name node to fix the mis-replicated blocks?

Here's what I'm using for now, but it is very slow:
for f in `hadoop fsck / | grep "Replica placement policy is violated"
| head -n3000 | awk -F: '{print $1}'`; do
hadoop fs -setrep 4 $f
hadoop fs -setrep 3 $f
done

John


Re: Fixing Mis-replicated blocks

2011-10-21 Thread John Meagher
In this case everything should be 3.  I was hoping there was a quicker
way.  The -w option should help so this doesn't need to be run again.

On Thu, Oct 20, 2011 at 20:26, Jeff Bean  wrote:
> Do setrep -w on the increase to force the new replica before decreasing
> again.
>
> Of course, the little script only works if the replication factor is 3 on
> all the files. If it's a variable amount you should use the java API to get
> the existing factor and then increase by one and then decrease.
>
> Jeff
>
> On Thu, Oct 20, 2011 at 8:44 AM, John Meagher wrote:
>
>> After a hardware move with an unfortunate mis-setup rack awareness
>> script our hadoop cluster has a large number of mis-replicated blocks.
>>  After about a week things haven't gotten better on their own.
>>
>> Is there a good way to trigger the name node to fix the mis-replicated
>> blocks?
>>
>> Here's what I'm using for now, but it is very slow:
>> for f in `hadoop fsck / | grep "Replica placement policy is violated"
>> | head -n3000 | awk -F: '{print $1}'`; do
>>    hadoop fs -setrep 4 $f
>>    hadoop fs -setrep 3 $f
>> done
>>
>> John
>>
>


Re: How to find out whether a node is Overloaded from Cpu utilization ?

2012-01-18 Thread John Meagher
The problem I've run into more than memory is having the system CPU
time get out of control.  My guess is that the threshold for what is
considered "overloaded" is going to be dependent on your system setup,
what you're running on it, and what bounds your jobs.


On Tue, Jan 17, 2012 at 22:06, ArunKumar  wrote:
>
>
> Guys !
>
> So can i say that if memory usage is more than say 90 % the node is
> overloaded.
> If so, what can be that threshold percent value or how can we find it ?
>
>
>
> Arun
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-find-out-whether-a-node-is-Overloaded-from-Cpu-utilization-tp3665289p3668167.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: rack awareness and safemode

2012-03-20 Thread John Meagher
Unless something has changed recently it won't automatically relocate
the blocks.  When I did something similar I had a script that walked
through the whole set of files that were misreplicated and increased
the replication factor then dropped it back down.  This triggered
relocation of blocks to meet the rack requirements.

Doing this worked, but took about a week to run over a few hundred
thousand files that were misreplicated.

Here's the script I used (all sorts of caveats about it assuming a
replication factor of 3 and no real error handling, etc)...

for f in `hadoop fsck / | grep "Replica placement policy is violated"
| head -n8 | awk -F: '{print $1}'`; do
hadoop fs -setrep -w 4 $f
hadoop fs -setrep 3 $f
done


On Tue, Mar 20, 2012 at 16:20, Patai Sangbutsarakum
 wrote:
> Hadoopers!!
>
> I am going to restart hadoop cluster in order to enable rack-awareness
> first time.
> Currently we're running 0.20.203 with 500TB of data on 250+ nodes
> (without rack-awareness)
>
> I am thinking and afraid that when i start dfs (with rack-awareness
> enable) the HDFS will be in safemode for hours
> busy with relocating block to comply with rack-awareness.
>
> Anything knob i can dial to prevent that ?
>
> Thanks in advances
> Patai


Re: rack awareness and safemode

2012-03-22 Thread John Meagher
Make sure you run "hadoop fsck /".  It should report a lot of blocks
with the replication policy violated.  In the sort term it isn't
anything to worry about and everything will work fine even with those
errors.  Run the script I sent out earlier to fix those errors and
bring everything into compliance with the new rack awareness setup.


On Thu, Mar 22, 2012 at 13:36, Patai Sangbutsarakum
 wrote:
> I restarted the cluster yesterday with rack-awareness enable.
> Things went well. confirm that there was no issues at all.
>
> Thanks you all again.
>
>
> On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum
>  wrote:
>> Thanks you all.
>>
>>
>> On Tue, Mar 20, 2012 at 2:44 PM, Harsh J  wrote:
>>> John has already addressed your concern. I'd only like to add that
>>> fixing of replication violations does not require your NN to be in
>>> safe mode and it won't be. Your worry can hence be voided :)
>>>
>>> On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum
>>>  wrote:
>>>> Thanks for your reply and script. Hopefully it still apply to 0.20.203
>>>> As far as I play with test cluster. The balancer would take care of
>>>> replica placement.
>>>> I just don't want to fall into the situation that the hdfs sit in the
>>>> safemode
>>>> for hours and users can't use hadoop and start yelping.
>>>>
>>>> Let's hear from others.
>>>>
>>>>
>>>> Thanks
>>>> Patai
>>>>
>>>>
>>>> On 3/20/12 1:27 PM, "John Meagher"  wrote:
>>>>
>>>>>ere's the script I used (all sorts of caveats about it assuming a
>>>>>replication factor of 3 and no real error handling, etc)...
>>>>>
>>>>>for f in `hadoop fsck / | grep "Replica placement policy is violated"
>>>>>| head -n8 | awk -F: '{print $1}'`; do
>>>>>    hadoop fs -setrep -w 4 $f
>>>>>    hadoop fs -setrep 3 $f
>>>>>done
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J


Re: Applications creates bigger output than input?

2011-04-29 Thread John Meagher
Another case is augmenting data.  This is sometimes done outside of MR
in an ETL flow, but can be done as an MR job.  Doing something like
this is using Hadoop to handle the scaling issues, but really isn't
what MR is intended for.

A real example of this is:

* Input: standard apache weblog
* Data added...
  - Geolocation of IP
  - Decoding URL
  - Adding information based on visited URL / Ref URL ...
  - Adding information based on the user
* Output complex binary object to a sequence file


On Fri, Apr 29, 2011 at 08:02, elton sky  wrote:
> One of assumptions map reduce made, I think, is that size of map's output is
> smaller than input. Although we can see many applications have the same size
> of output with input, like, sort, merge,etc.
> For my benchmark purpose, I am looking for some non-trivial, real life
> applications which creates *bigger* output than its input. Trivial example I
> can think about is cross join...
>
> I really appreciate if you share your knowledge with me.
>