Re: Yarn mapreduce Logging : syslog vs stderr log files

2018-03-20 Thread Sultan Alamro

LOG.info(“text”) —> syslog

> On Mar 20, 2018, at 9:02 PM, chandan prakash  
> wrote:
> 
> Hi All,
> Currently my yarn MR job is writing logs to syslog and stderr. 
> I want to know :
>  how it is decided which log will go to syslog and which will go to stderr ?
> Can I redirect logs instead of going to stderr to syslog ?
>  If YES :  how?
>  If NO : Can we ensure log rolling for stderr as well like we do for 
> syslog using mapreduce.job.log4j-properties-file property ? 
> 
> Actually my most of logs are coming to stderr instead and I want to ensure 
> log rolling for the same while application is up.
> Any leads will be highly appreciable.
> Thanks in advance.
> 
> Regards,
> -- 
> Chandan Prakash
> 


Re: How to print values in console while running MapReduce application

2017-10-04 Thread Sultan Alamro
Hi,

The easiest way is to open a new window and display the log file as follow
tail -f /path/to/log/file.log

Best,
Sultan

> On Oct 4, 2017, at 5:20 PM, Tanvir Rahman  wrote:
> 
> Hello,
> I have a small cluster and I am running MapReduce WordCount application in 
> it. 
> I want to print some variable values in the console(where you can see the map 
> and reduce job progress and other job information) for debugging purpose 
> while running the MapReduce application.  What is the easiest way to do that?
> 
> Thanks in advance.
> 
> Tanvir
> 


Re: Physical memory (bytes) snapshot counter question - how to get maximum memory used in reduce task

2017-04-05 Thread Sultan Alamro
Hi Nico,

Did you check the jhist file?
It has all details about each task.

Best,
Sultan

> On Apr 5, 2017, at 9:15 PM, Nico Pappagianis 
>  wrote:
> 
> Hi all
> 
> I've made some memory optimizations on the reduce task and I would like to 
> compare the old reducer vs new reducer in terms of maximum memory consumption.
> 
> I have a question regarding the description of the following counter:
> 
> PHYSICAL_MEMORY_BYTES | Physical memory (bytes) snapshot | Total physical 
> memory used by all tasks including spilled data.
> 
> I'm assuming this means the aggregate of memory used throughout the entire 
> reduce task (if viewing at the reduce task-level). 
> Please correct me if I'm wrong on this assumption (the description seems 
> pretty straightforward).
> 
> Is there a way to get the maximum (not total) memory used by a reduce task 
> from the default counters?
> 
> Thanks!
> 
> 
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Replacement of Hadoop-ec2 script

2017-02-26 Thread Sultan Alamro
Check out this link

https://blog.insightdatascience.com/spinning-up-a-free-hadoop-cluster-step-by-step-c406d56bae42#.9n2u8myxt

On Tue, Feb 21, 2017 at 10:14 AM, Shiyuan  wrote:

> Hi Hadoop Users,
>
> The script to setup Hadoop on EC2 as described in https://wiki.apache.org/
> hadoop/AmazonEC2  has been removed from recent hadoop release. Google
> points me to  an alternative http://whirr.apache.org/ which also has been
> retired for more than a year.   Is there a replacement or alternative which
> is still good to set up the latest version of Hadoop on EC2?  Thank you!
>
>


Re: Multiple config files after building hadoop src code, Which one to modify?

2017-02-13 Thread Sultan Alamro
After compiling the source code, you can find them here

$HADOOP_HOME/etc/hadoop/

> On Feb 13, 2017, at 4:17 PM, Tanvir Rahman  wrote:
> 
> Hello everyone,
> I am currently working on a research project where i need to understand the 
> yarn mapreduce Application Master code in hadoop 2.7.3. I have downloaded 
> hadoop-2.7.3 source code, built it, I have a local Hadoop configuration on my 
> machine and I can successfully run wordcount application with that copy.
> 
> I am confused about location of exact configuration files. For example, if I 
> want to modify the yarn-site.xml file, there are a few hdfs-site.xml in my 
> local machine.
> 
> tanvir@dhcp143:~> locate yarn-site.xml 
> /home/tanvir/backup-hadoop-2.7.3-src-config/yarn-site.xml 
> /home/tanvir/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/hadoop-common-2.7.3/etc/hadoop/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/etc/hadoop/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/share/hadoop/tools/sls/sample-conf/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-tools/hadoop-sls/src/test/resources/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-tools/hadoop-sls/target/hadoop-sls-2.7.3/sls/sample-conf/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-tools/hadoop-sls/target/test-classes/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-tools/hadoop-tools-dist/target/hadoop-tools-dist-2.7.3/share/hadoop/tools/sls/sample-conf/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/conf/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/conf/yarn-site.xml~
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/test-classes/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/target/test-classes/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/target/hadoop-yarn-project-2.7.3/etc/hadoop/yarn-site.xml
>  
> /home/tanvir/hadoop-2.7.3-src/hadoop-yarn-project/target/hadoop-yarn-project-2.7.3/etc/hadoop/yarn-site.xml~
>  
> 
> 
> Currently I am changing all of them (skipping the test cases and templates)  
> and it is working but i wonder which would be the one i should modify.
> 
> 
> Thanks in advance.
> Tanvir
> 


Re: Heartbeat between RM and AM

2017-01-06 Thread Sultan Alamro
Hi Sunil,

Thank you for your reply.

I don't mean the heartbeat interval. Based on my knowledge, the AM report a
job's progress every t ms, so I need to add more info to the messages
between the AM and RM.



On Mon, Jan 2, 2017 at 4:56 AM, Sunil Govind <sunil.gov...@gmail.com> wrote:

> Hi
>
> If you are thinking about allocation requests heartbeat calls from AM to
> RM, then its mostly driven per application level (not YARN specific
> config). For eg: in MapReduce, below config is used for same.
> yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms
>
> Thanks
> Sunil
>
>
> On Sat, Dec 31, 2016 at 8:20 AM Sultan Alamro <sultan.ala...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Can any one tell me how I can modify the heartbeat between the RM and AM?
>> I need to add new requests to the AM from the RM.
>>
>> These requests basically are values calculated by the RM to be used by
>> the AM online.
>>
>> Thanks,
>> Sultan
>>
>


Heartbeat between RM and AM

2016-12-30 Thread Sultan Alamro
Hi all,

Can any one tell me how I can modify the heartbeat between the RM and AM? I
need to add new requests to the AM from the RM.

These requests basically are values calculated by the RM to be used by the
AM online.

Thanks,
Sultan


Re: Configuration per job

2016-10-25 Thread Sultan Alamro

You might need to look at the scheduler's configuration. 

> On Oct 25, 2016, at 10:06 PM, Jeff Zhang  wrote:
> 
> No, this is RM configuration, which is applied to all jobs.
> 
> 
> 
> 정현진 于2016年10月26日周三 上午7:23写道:
>> Hi.
>> 
>> Is it possible to change property per Job in *-site.xml like 
>> yarn.scheduler.minimum-allocation-mb?
>> 
>> I want to set the value of yarn.scheduler.minimum-allocation-mb in MapReduce 
>> application's main() function.
>> 
>> Jung


Map Task Execution Time

2016-05-27 Thread Sultan Alamro
Hi there,

By looking the .jhist file of a job, I see that there are startTime and
finishTime for each map task.

My question is, does reading the input data (local or remote) included in
the execution time?


Thanks,
Sultan


Re: how to use Yarn API to find task/attempt status

2016-03-09 Thread Sultan Alamro

You still can see the tasks status through the web interfaces.

Look at the end of this page
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

> On Mar 10, 2016, at 12:58 AM, Frank Luo  wrote:
> 
> Let’s say there are 10 standard M/R jobs running. How to find how many tasks 
> are done/running/pending?
>  
> From: Jeff Zhang [mailto:zjf...@gmail.com] 
> Sent: Wednesday, March 09, 2016 9:33 PM
> To: Frank Luo
> Cc: user@hadoop.apache.org
> Subject: Re: how to use Yarn API to find task/attempt status
>  
> I don't think it is related with yarn. Yarn don't know about task/task 
> attempt, it only knows containers. So it should be your application to 
> provide such function. 
>  
> On Thu, Mar 10, 2016 at 11:29 AM, Frank Luo  wrote:
> Anyone had a similar issue and knows the answer?
>  
> From: Frank Luo 
> Sent: Wednesday, March 09, 2016 1:59 PM
> To: 'user@hadoop.apache.org'
> Subject: how to use Yarn API to find task/attempt status
>  
> I have a need to programmatically find out how many tasks are pending in 
> Yarn. Is there a way to do it through a Java API?
>  
> I looked at YarnClient, but not able to find what I need.
>  
> Thx in advance.
>  
> Frank Luo
> This email and any attachments transmitted with it are intended for use by 
> the intended recipient(s) only. If you have received this email in error, 
> please notify the sender immediately and then delete it. If you are not the 
> intended recipient, you must not keep, use, disclose, copy or distribute this 
> email without the author’s prior permission. We take precautions to minimize 
> the risk of transmitting software viruses, but we advise you to perform your 
> own virus checks on any attachment to this message. We cannot accept 
> liability for any loss or damage caused by software viruses. The information 
> contained in this communication may be confidential and may be subject to the 
> attorney-client privilege.
> 
> 
> 
>  
> --
> Best Regards
> 
> Jeff Zhang
> This email and any attachments transmitted with it are intended for use by 
> the intended recipient(s) only. If you have received this email in error, 
> please notify the sender immediately and then delete it. If you are not the 
> intended recipient, you must not keep, use, disclose, copy or distribute this 
> email without the author’s prior permission. We take precautions to minimize 
> the risk of transmitting software viruses, but we advise you to perform your 
> own virus checks on any attachment to this message. We cannot accept 
> liability for any loss or damage caused by software viruses. The information 
> contained in this communication may be confidential and may be subject to the 
> attorney-client privilege.


Intermediate Data Spill in Mapreduce (Buffer Memory)

2016-03-08 Thread Sultan Alamro
Hi there,

I run a word count job in Hadoop 2.6.0 and I see that there are several
spills for the  map output.

I have the following configuration:
mapreduce.task.io.sort.mb = 100
mapreduce.map.sort.spill.percent = 0.80

After running the job, the Map output bytes = 222660096.
By looking at the container log below, it seems that the buffer size is
31055173 bytes not 100MB.

Look at the parameters values before spilling the first spill into the disk
bufstart = 0; bufend = 31055173; bufvoid = 104857600

And by dividing the Map output bytes by bufend (222660096/31055173 = 7.17
(= 8 spills)).

Why is this happening? The buffer size should be 80MB (soft limit), so I
think I should have 3 spills only.
Also, can anyone give a better explanation for bufstart, bufend and
bufvoid?

Thanks,
Sultan



2016-03-04 20:38:16,678 INFO [main] org.apache.hadoop.mapred.MapTask:
Processing split: hdfs://hmaster:9000/input/Input-128M:0+134217728
2016-03-04 20:38:16,842 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 0 kvi 26214396(104857584)
2016-03-04 20:38:16,842 INFO [main] org.apache.hadoop.mapred.MapTask:
mapreduce.task.io.sort.mb: 100
2016-03-04 20:38:16,842 INFO [main] org.apache.hadoop.mapred.MapTask: soft
limit at 83886080
2016-03-04 20:38:16,842 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 104857600
2016-03-04 20:38:16,842 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396; length = 6553600
2016-03-04 20:38:16,854 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-03-04 20:38:19,931 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2016-03-04 20:38:19,931 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 31055173; bufvoid = 104857600
2016-03-04 20:38:19,931 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396(104857584); kvend = 13006672(52026688); length =
13207725/6553600
2016-03-04 20:38:19,931 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 41540922 kvi 10385224(41540896)
2016-03-04 20:38:28,178 INFO [SpillThread]
org.apache.hadoop.mapred.MapTask: Finished spill 0
2016-03-04 20:38:28,179 INFO [main] org.apache.hadoop.mapred.MapTask:
(RESET) equator 41540922 kv 10385224(41540896) kvi 7763800(31055200)
2016-03-04 20:38:30,020 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2016-03-04 20:38:30,020 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 41540922; bufend = 72594952; bufvoid = 104857600
2016-03-04 20:38:30,020 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 10385224(41540896); kvend = 23391620(93566480); length =
13208005/6553600
2016-03-04 20:38:30,020 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 83080708 kvi 20770172(83080688)
2016-03-04 20:38:37,042 INFO [SpillThread]
org.apache.hadoop.mapred.MapTask: Finished spill 1
2016-03-04 20:38:37,042 INFO [main] org.apache.hadoop.mapred.MapTask:
(RESET) equator 83080708 kv 20770172(83080688) kvi 18148744(72594976)
2016-03-04 20:38:39,045 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2016-03-04 20:38:39,045 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 83080708; bufend = 9277592; bufvoid = 104857598
2016-03-04 20:38:39,045 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 20770172(83080688); kvend = 7562280(30249120); length =
13207893/6553600
2016-03-04 20:38:39,045 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 19763348 kvi 4940832(19763328)
2016-03-04 20:38:46,376 INFO [SpillThread]
org.apache.hadoop.mapred.MapTask: Finished spill 2
2016-03-04 20:38:46,376 INFO [main] org.apache.hadoop.mapred.MapTask:
(RESET) equator 19763348 kv 4940832(19763328) kvi 2319404(9277616)
2016-03-04 20:38:48,320 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2016-03-04 20:38:48,320 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 19763348; bufend = 50820055; bufvoid = 104857600
2016-03-04 20:38:48,320 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 4940832(19763328); kvend = 17947892(71791568); length =
13207341/6553600
2016-03-04 20:38:48,320 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 61305803 kvi 15326444(61305776)
2016-03-04 20:38:55,301 INFO [SpillThread]
org.apache.hadoop.mapred.MapTask: Finished spill 3
2016-03-04 20:38:55,301 INFO [main] org.apache.hadoop.mapred.MapTask:
(RESET) equator 61305803 kv 15326444(61305776) kvi 12705020(50820080)
2016-03-04 20:38:57,209 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2016-03-04 20:38:57,209 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 61305803; bufend = 92359884; bufvoid = 104857600
2016-03-04 20:38:57,209 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 15326444(61305776); kvend = 2118452(8473808); length =
13207993/6553600
2016-03-04 20:38:57,209 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 102845638 kvi 25711404(102845616)
2016-03-04 20:39:04,710 INFO [SpillThread]

Blocks processed by which task?

2016-01-20 Thread Sultan Alamro
Hi there,

How do I know which block in my HDFS processed by which task?
I want to make sure if my Hadoop applies "Locality" concept or not.


Thanks,


Re: Running multiple copies of each task

2015-12-17 Thread Sultan Alamro
Thanks Namikaze!

Another question:

New tasks in hadoop always have higher priority than speculative tasks.
Does anyone know how and where I can change this priority?


Thanks,
Sultan


On Thu, Dec 3, 2015 at 7:46 AM, Namikaze Minato <lloydsen...@gmail.com>
wrote:

> I think you are looking for mapreduce.reduce.speculative
> Be careful, for some reason, this fell into my spam folder.
>
> Regards,
> LLoyd
>
> On 3 December 2015 at 01:05, Sultan Alamro <sultan.ala...@gmail.com>
> wrote:
> > Hi there,
> >
> > I have been looking at the hadoop source code 2.6.0 trying to understand
> the
> > low level details and how the framework is actually working.
> >
> > I have a simple idea and I am trying to figure out where and how the idea
> > can be implemented. The idea can be described in one sentence: "Running
> > multiple copies of each task". However, implementing the idea is not as
> > simple as I think.
> >
> > What I am aware of is that I only need to modify a few classes. But,
> which
> > classes?
> >
> > I just need someone to guide me to the right direction.
> >
> >
> > Best,
> > Sultan
>


Running multiple copies of each task

2015-12-02 Thread Sultan Alamro
Hi there,

I have been looking at the hadoop source code 2.6.0 trying to understand
the low level details and how the framework is actually working.

I have a simple idea and I am trying to figure out where and how the idea
can be implemented. The idea can be described in one sentence: "Running
multiple copies of each task". However, implementing the idea is not as
simple as I think.

What I am aware of is that I only need to modify a few classes. But, which
classes?

I just need someone to guide me to the right direction.


Best,
Sultan