Re:Re: Stopping ntpd signals SIGTERM, then causes namenode exit

2015-02-09 Thread David chen
Thanks for your reply.
I ever indeed misunderstand how ntpd works, so deployed the shell script on 
every node to synchronize time. The reason why the script contains stopping 
ntpd command is that the following command 'ntpdate 192.168.0.1' will fail if 
miss it.
Now i have understood the difference between ntp and ntpdate, so i discarded 
the script, then configured the ntp.conf to synchronize time.
The reason to post the thread is that i wonder why NameNode receive the SIGTERM 
signal sent by stopping ntpd command? and why three times all happened at 
14:00:00?

Re: Home for Apache Big Data Solutions?

2015-02-09 Thread Jay Vyas
Bigtop.. Yup!

Mr Asanjar : why don't you post an email about what your doing on the Apache 
bigtop list, we'd love to hear from you.

There could possibly  be some overlap and our goal is to plumb the hadoop 
ecosystem as well



> On Feb 9, 2015, at 4:41 PM, Artem Ervits  wrote:
> 
> I believe Apache Bigtop is what you're looking for.
> 
> Artem Ervits
> 
>> On Feb 9, 2015 8:15 AM, "Jean-Baptiste Onofré"  wrote:
>> Hi Amir,
>> 
>> thanks for the update.
>> 
>> Please, let me know if you need some help on the proposal and to "qualify" 
>> your ideas.
>> 
>> Regards
>> JB
>> 
>>> On 02/09/2015 02:05 PM, MrAsanjar . wrote:
>>> Hi Chris,
>>> thanks for the information, will get on it ...
>>> 
>>> Hi JB
>>> Glad that you are familiar with Juju, however my personal goal is not to
>>> promote any tool but
>>> to take the next step, which is to build a community for apache big data
>>> solutions.
>>> 
>>>  >>do you already have a kind of proposal/description of your projects ?
>>> working on it :) I got the idea while flying back from South Africa on
>>> Saturday. During my trip I noticed most of the communities spending
>>> their precious resources on solution plumbing, without much of emphasis
>>> on solution best practices due to the lack of expertise. By the time
>>> Big Data solution framework becomes operational, funding has diminished
>>> enough to limit solution activity (i.e data analytic payload
>>> development). I am sure we could find
>>> similar scenarios with  other institutions and SMB (small and
>>> medium-size businesses) anywhere.
>>> In the nutshell my goals are as follow:
>>> 1) Make Big Data solutions available to everyone
>>> 2) Encapsulate the best practices
>>> 3) All Orchestration tools are welcomed - Some solutions could have
>>> hybrid tooling model
>>> 4) Enforce automated testing and quality control.
>>> 5) Share analytic payloads (i.e mapreduce apps, storm topology, Pig
>>> scripts,...)
>>> 
>>> 
>>>  >>Is it like distribution, or tooling ?
>>> Good question, I envision to have a distribution model as it has
>>> dependency on Apache hadoop projects distributions.
>>> 
>>>  >>What's the current license ?
>>> Charms/Bundles are moving to Apache 2.0 license, target data 2/27.
>>> 
>>> Regards
>>> Amir Sanjar
>>> Big Data Solution Lead
>>> Canonical
>>> 
>>> On Sun, Feb 8, 2015 at 10:46 AM, Mattmann, Chris A (3980)
>>> mailto:chris.a.mattm...@jpl.nasa.gov>>
>>> wrote:
>>> 
>>> Dear Amir,
>>> 
>>> Thank you for your interest in contributing these projects
>>> to the ASF! Sincerely appreciate it.
>>> 
>>> My suggestion would be to look into the Apache Incubator,
>>> which is the home for incoming projects at the ASF. The
>>> TL;DR answer is:
>>> 
>>> 1. You’ll need to create a proposal for each project
>>> that you would like to bring in using:
>>> http://incubator.apache.org/guides/proposal.html
>>> 
>>> 
>>> 2. You should put your proposal up on a public wiki
>>> for each project:
>>> http://wiki.apache.org/incubator/
>>> create a new page e.g., YourProjectProposal, which would in
>>> turn become http://wiki.apache.org/incubator/YouProjectProposal
>>> You will need to request permissions to add the page on the
>>> wiki
>>> 
>>> 3. Recruit at least 3 IPMC/ASF members to mentor your project:
>>> http://people.apache.org/committers-by-project.html#incubator-pmc
>>> 
>>> http://people.apache.org/committers-by-project.html#member
>>> 
>>> 
>>> 4. Submit your proposal for consideration at the Incubator
>>> 5. Enjoy!
>>> 
>>> Cheers and good luck.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattm...@nasa.gov 
>>> WWW: http://sunset.usc.edu/~mattmann/
>>> ++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -Original Message-
>>> From: "MrAsanjar ." mailto:afsan...@gmail.com>>
>>> Reply-To: "user@hadoop.apache.org "
>>> mailto:user@hadoop.apache.org>>
>>> Date: Sunday, February 8, 2015 at 8:36 AM
>>> To: "user@hadoop.apache.org "
>>> mailto:user@hadoop.apache.org>>,
>>> "dev-i...@bigtop.apache.org "
>>> mailto:dev-i...@bigtop.apache.org>>
>>> Subject: Home for Apache Big Data Solutions?
>>> 
>>>  >Hi all,
>>>  >My name is Amir Sanjar, Big Data Solutio

Re: Home for Apache Big Data Solutions?

2015-02-09 Thread Artem Ervits
I believe Apache Bigtop is what you're looking for.

Artem Ervits
On Feb 9, 2015 8:15 AM, "Jean-Baptiste Onofré"  wrote:

> Hi Amir,
>
> thanks for the update.
>
> Please, let me know if you need some help on the proposal and to "qualify"
> your ideas.
>
> Regards
> JB
>
> On 02/09/2015 02:05 PM, MrAsanjar . wrote:
>
>> Hi Chris,
>> thanks for the information, will get on it ...
>>
>> Hi JB
>> Glad that you are familiar with Juju, however my personal goal is not to
>> promote any tool but
>> to take the next step, which is to build a community for apache big data
>> solutions.
>>
>>  >>do you already have a kind of proposal/description of your projects ?
>> working on it :) I got the idea while flying back from South Africa on
>> Saturday. During my trip I noticed most of the communities spending
>> their precious resources on solution plumbing, without much of emphasis
>> on solution best practices due to the lack of expertise. By the time
>> Big Data solution framework becomes operational, funding has diminished
>> enough to limit solution activity (i.e data analytic payload
>> development). I am sure we could find
>> similar scenarios with  other institutions and SMB (small and
>> medium-size businesses) anywhere.
>> In the nutshell my goals are as follow:
>> 1) Make Big Data solutions available to everyone
>> 2) Encapsulate the best practices
>> 3) All Orchestration tools are welcomed - Some solutions could have
>> hybrid tooling model
>> 4) Enforce automated testing and quality control.
>> 5) Share analytic payloads (i.e mapreduce apps, storm topology, Pig
>> scripts,...)
>>
>>
>>  >>Is it like distribution, or tooling ?
>> Good question, I envision to have a distribution model as it has
>> dependency on Apache hadoop projects distributions.
>>
>>  >>What's the current license ?
>> Charms/Bundles are moving to Apache 2.0 license, target data 2/27.
>>
>> Regards
>> Amir Sanjar
>> Big Data Solution Lead
>> Canonical
>>
>> On Sun, Feb 8, 2015 at 10:46 AM, Mattmann, Chris A (3980)
>> mailto:chris.a.mattm...@jpl.nasa.gov>>
>> wrote:
>>
>> Dear Amir,
>>
>> Thank you for your interest in contributing these projects
>> to the ASF! Sincerely appreciate it.
>>
>> My suggestion would be to look into the Apache Incubator,
>> which is the home for incoming projects at the ASF. The
>> TL;DR answer is:
>>
>> 1. You’ll need to create a proposal for each project
>> that you would like to bring in using:
>> http://incubator.apache.org/guides/proposal.html
>>
>>
>> 2. You should put your proposal up on a public wiki
>> for each project:
>> http://wiki.apache.org/incubator/
>> create a new page e.g., YourProjectProposal, which would in
>> turn become http://wiki.apache.org/incubator/YouProjectProposal
>> You will need to request permissions to add the page on the
>> wiki
>>
>> 3. Recruit at least 3 IPMC/ASF members to mentor your project:
>> http://people.apache.org/committers-by-project.html#incubator-pmc
>>
>> http://people.apache.org/committers-by-project.html#member
>>
>>
>> 4. Submit your proposal for consideration at the Incubator
>> 5. Enjoy!
>>
>> Cheers and good luck.
>>
>> Cheers,
>> Chris
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov 
>> WWW: http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: "MrAsanjar ." mailto:afsan...@gmail.com>>
>> Reply-To: "user@hadoop.apache.org "
>> mailto:user@hadoop.apache.org>>
>> Date: Sunday, February 8, 2015 at 8:36 AM
>> To: "user@hadoop.apache.org "
>> mailto:user@hadoop.apache.org>>,
>> "dev-i...@bigtop.apache.org "
>> mailto:dev-i...@bigtop.apache.org>>
>> Subject: Home for Apache Big Data Solutions?
>>
>>  >Hi all,
>>  >My name is Amir Sanjar, Big Data Solution Development Lead at
>> Canonical.
>>  >My team has been developing various Big Data solutions build on top
>> of
>>  >Apache Hadoop projects (i.e. Hadoop, Hive, Pig,..) . We would like
>> to
>>  >contribute these pure open source solutions
>>  > to the Apache community. I wish to propose creating an apache
>> project to
>>  >house all big data solutions regardless of orchestration tool
>> (i.e. Juju,
>>  >Amb

Re: Kill one node on map start

2015-02-09 Thread Ravi Prakash
In unit tests MiniMRYarnCluster is used to do this kind of stuff.
 

 On Friday, February 6, 2015 3:51 AM, Telles Nobrega 
 wrote:
   

 Hi, I'm working on a experiment and I need to do something like, start a 
hadoop job (wordcount, terasort, pi) and let the application organize the 
distribution of files and before the map actually start working I need to kill 
a single node of my topology. Is there any automated way that I can do that?



Re: Stopping ntpd signals SIGTERM, then causes namenode exit

2015-02-09 Thread Alexander Alten-Lorenz
I would spot on 

Jan  7 14:52:48 host1 ntpd[44765]: no servers reachable

looks for me like an network / DNS issue. You can check per dmesg whats going 
on, too.

BR
- Alexander

> On 09 Feb 2015, at 17:57, daemeon reiydelle  wrote:
> 
> Absolutely a critical error to lose the configured ntpd time source in 
> Hadoop. The replication and many other services require absolutely 
> millisecond time sync between the nodes. Interesting that your SRE design 
> called for ntpd running on each node. Curious.
> 
> What is the problem you are trying to solve by stopping ntpd on the local 
> host? Did someone not understand how ntpd works? Did someone configure it to 
> (I sure hope not) be free running?
> 
> 
> 
> ...
> “Life should not be a journey to the grave with the intention of arriving 
> safely in a
> pretty and well preserved body, but rather to skid in broadside in a cloud of 
> smoke,
> thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a 
> Ride!” 
> - Hunter Thompson
> 
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
> 
> On Sun, Feb 8, 2015 at 7:30 PM, David chen  > wrote:
> A shell script is deployed on every node of HDFS cluster, the script is 
> invoked hourly by crontab, and its content is as follows:
> #!/bin/bash
> service ntpd stop
> ntpdate 192.168.0.1 #it's a valid ntpd server in LAN
> service ntpd start
> chkconfig ntpd on
> 
> After several days, NameNode crashed suddenly, but its log seemed no other 
> errors except the following:
> 2015-01-07 14:00:00,709 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
> 
> Inspected the Linux log(Centos /var/log/messages), also found the following 
> clues:
> Jan  7 14:00:01 host1 ntpd[32101]: ntpd exiting on signal 15
> Jan  7 13:59:59 host1 ntpd[44764]: ntpd 4.2.4p8@1.1612-o Fri Feb 22 11:23:27 
> UTC 2013 (1)
> Jan  7 13:59:59 host1 ntpd[44765]: precision = 0.143 usec
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #0 wildcard, 
> 0.0.0.0#123 Disabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #1 wildcard, ::#123 
> Disabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #2 lo, ::1#123 
> Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #3 em2, 
> fe80::ca1f:66ff:fee1:eed#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #4 lo, 
> 127.0.0.1#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #5 em2, 
> 192.168.1.151#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on routing socket on fd #22 for 
> interface updates
> Jan  7 13:59:59 host1 ntpd[44765]: kernel time sync status 2040
> Jan  7 13:59:59 host1 ntpd[44765]: frequency initialized 499.399 PPM from 
> /var/lib/ntp/drift
> Jan  7 14:00:01 host1 ntpd_initres[32103]: parent died before we finished, 
> exiting
> Jan  7 14:04:17 host1 ntpd[44765]: synchronized to 192.168.0.191, stratum 2
> Jan  7 14:04:17 host1 ntpd[44765]: kernel time sync status change 2001
> Jan  7 14:26:02 host1 snmpd[4842]: Received TERM or STOP signal...  shutting 
> down...
> Jan  7 14:26:02 host1 kernel: netlink: 12 bytes leftover after parsing 
> attributes.
> Jan  7 14:26:02 host1 snmpd[45667]: NET-SNMP version 5.5
> Jan  7 14:52:48 host1 ntpd[44765]: no servers reachable
> 
> It looks likely that NameNode received the SIGTERM signal sent by stopping 
> ntpd command.
> Up to now, the problem has happened three times repeatedly, the time point 
> was Jan  7 14:00:00, Jan 14 14:00:00 and Feb  4 14:00:00 respectively.
> Although the script to synchronize time is a little improper, and i also know 
> the correct synchronized way. but i wonder why NameNode can receive the 
> SIGTERM signal sent by stopping ntpd command? and why three times all 
> happened at 14:00:00?
> Any ideas can be appreciated.
> 



Re: Max Connect retries

2015-02-09 Thread Telles Nobrega
It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator

On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle  wrote:

> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *...*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega 
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong  wrote:
>>
>>>  That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>>  Thanks
>>>
>>>  Xuan Gong
>>>
>>>   From: Telles Nobrega 
>>> Reply-To: "user@hadoop.apache.org" 
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "user@hadoop.apache.org" 
>>> Subject: Max Connect retries
>>>
>>>   Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>>  org.apache.hadoop.ipc.Client: Retrying connect to server: 
>>> hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already 
>>> tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] 
>>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents 
>>> request from attempt_1423319128424_0025_r_00_0. startIndex 24 maxEvents 
>>> 1
>>>
>>> Is this the expected behaviour? should I change max retries to a lower 
>>> values? if so, which  config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>


Re: Max Connect retries

2015-02-09 Thread daemeon reiydelle
Are your nodes actually stuck or are you in e.g. a reduce step that is
reading so much data across the network that the node SEEMS unreachable?


Since you mention "gets stuck for a while at 25%", that suggests that
eventually the node finishes up its work ...



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega 
wrote:

> Thanks
>
> On Mon Feb 09 2015 at 01:43:24 Xuan Gong  wrote:
>
>>  That is for client connect retry in ipc level.
>>
>> You can decrease the max.retries by configuring
>>
>> ipc.client.connect.max.retries.on.timeouts
>>
>> in core-site.xml
>>
>>
>>  Thanks
>>
>>  Xuan Gong
>>
>>   From: Telles Nobrega 
>> Reply-To: "user@hadoop.apache.org" 
>> Date: Saturday, February 7, 2015 at 8:37 PM
>> To: "user@hadoop.apache.org" 
>> Subject: Max Connect retries
>>
>>   Hi, I changed my cluster config so a failed nodemanager can be
>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>> failed node:
>>
>>  org.apache.hadoop.ipc.Client: Retrying connect to server: 
>> hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already 
>> tried 28 time(s); maxRetries=45
>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] 
>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents 
>> request from attempt_1423319128424_0025_r_00_0. startIndex 24 maxEvents 
>> 1
>>
>> Is this the expected behaviour? should I change max retries to a lower 
>> values? if so, which  config is that?
>>
>> Thanks
>>
>>
>>


Re: Adding datanodes to Hadoop cluster - Will data redistribute?

2015-02-09 Thread Manoj Venkatesh
Thank you all for answering, the hdfs balancer worked. Now the datanodes 
capacity is more or less equally balanced.

Regards,
Manoj

From: Arpit Agarwal mailto:aagar...@hortonworks.com>>
Reply-To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 3:07 PM
To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. 
Take a look at the 'hdfs balancer' command which can be run as a separate 
administrative tool to rebalance data distribution across DataNodes.


From: Manoj Venkatesh mailto:manove...@gmail.com>>
Reply-To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 
additional nodes were added later to increase disk and CPU capacity. What i see 
is that processing is shared amongst all the nodes whereas the storage is 
reaching capacity on the original 6 nodes whereas the newly added machines have 
relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so 
that all the nodes are equally utilized. I have checked for the configuration 
parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round 
Robin' or 'Available Space', are there any other configurations which need to 
be reviewed.

Thanks,
Manoj

The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Re: Stopping ntpd signals SIGTERM, then causes namenode exit

2015-02-09 Thread daemeon reiydelle
Absolutely a critical error to lose the configured ntpd time source in
Hadoop. The replication and many other services require absolutely
millisecond time sync between the nodes. Interesting that your SRE design
called for ntpd running on each node. Curious.

What is the problem you are trying to solve by stopping ntpd on the local
host? Did someone not understand how ntpd works? Did someone configure it
to (I sure hope not) be free running?




*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Sun, Feb 8, 2015 at 7:30 PM, David chen  wrote:

> A shell script is deployed on every node of HDFS cluster, the script is
> invoked hourly by crontab, and its content is as follows:
> #!/bin/bash
> service ntpd stop
> ntpdate 192.168.0.1 #it's a valid ntpd server in LAN
> service ntpd start
> chkconfig ntpd on
>
> After several days, NameNode crashed suddenly, but its log seemed no other
> errors except the following:
> 2015-01-07 14:00:00,709 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
>
> Inspected the Linux log(Centos /var/log/messages), also found the
> following clues:
> Jan  7 14:00:01 host1 ntpd[32101]: ntpd exiting on signal 15
> Jan  7 13:59:59 host1 ntpd[44764]: ntpd 4.2.4p8@1.1612-o Fri Feb 22
> 11:23:27 UTC 2013 (1)
> Jan  7 13:59:59 host1 ntpd[44765]: precision = 0.143 usec
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #0 wildcard,
> 0.0.0.0#123 Disabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #1 wildcard,
> ::#123 Disabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #2 lo, ::1#123
> Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #3 em2,
> fe80::ca1f:66ff:fee1:eed#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #4 lo,
> 127.0.0.1#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #5 em2,
> 192.168.1.151#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on routing socket on fd #22
> for interface updates
> Jan  7 13:59:59 host1 ntpd[44765]: kernel time sync status 2040
> Jan  7 13:59:59 host1 ntpd[44765]: frequency initialized 499.399 PPM from
> /var/lib/ntp/drift
> Jan  7 14:00:01 host1 ntpd_initres[32103]: parent died before we finished,
> exiting
> Jan  7 14:04:17 host1 ntpd[44765]: synchronized to 192.168.0.191, stratum 2
> Jan  7 14:04:17 host1 ntpd[44765]: kernel time sync status change 2001
> Jan  7 14:26:02 host1 snmpd[4842]: Received TERM or STOP signal...
>  shutting down...
> Jan  7 14:26:02 host1 kernel: netlink: 12 bytes leftover after parsing
> attributes.
> Jan  7 14:26:02 host1 snmpd[45667]: NET-SNMP version 5.5
> Jan  7 14:52:48 host1 ntpd[44765]: no servers reachable
>
> It looks likely that NameNode received the SIGTERM signal sent by
> stopping ntpd command.
> Up to now, the problem has happened three times repeatedly, the time point
> was Jan  7 14:00:00, Jan 14 14:00:00 and Feb  4 14:00:00 respectively.
> Although the script to synchronize time is a little improper, and i also
> know the correct synchronized way. but i wonder why NameNode can receive
> the SIGTERM signal sent by stopping ntpd command? and why three times all
> happened at 14:00:00?
> Any ideas can be appreciated.
>


Re: yarn jobhistory server not displaying all jobs

2015-02-09 Thread Matt K
I found the root cause. Sharing in case someone else runs into this issue.
I'm running Yarn, Hadoop 2.3.

The reason the jobs weren’t showing up in JobHistoryServer had to do with
how we submit jobs. If the same job is submitted via “hadoop jar …”
everything works fine. But if the job is submitted via “java –cp … “ which
is what we are doing, the job runs fine and all, but doesn’t make it to
JobHistoryServer.

The difference there is the classpath. When I added `hadoop classpath` to
our class path, the jobs started to show up.

There is definitely a bug in error handling, since there were no errors or
warnings in any of hadoop logs, but clearly there's a class required on the
client-side that was missing. I haven't tried tracking down the missing
jar/class.

-Matt

On Tue, Jan 27, 2015 at 9:39 AM, Matt K  wrote:

> Thanks Ravi! This helps.
>
> On Mon, Jan 26, 2015 at 2:22 PM, Ravi Prakash  wrote:
>
>> Hi Matt!
>>
>> Take a look at the mapreduce.jobhistory.* configuration parameters here
>> for the delay in moving finished jobs to the HistoryServer:
>>
>> https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
>>
>> I've seen this error "hadoop is not allowed to impersonate hadoop" when I
>> tried configuring hadoop proxy users
>>
>>
>>   On Friday, January 23, 2015 10:43 AM, Matt K 
>> wrote:
>>
>>
>> Hello,
>>
>> I am an issue with Yarn's JobHistory Server, which is making it painful
>> to debug jobs. The latest jobs (from the last 12 hours or so) are missing
>> from the JobHistory Server, but present in ResourceManager Yarn UI. I am
>> seeing 8 jobs only in the JobHistory, and 15 in Yarn UI.
>>
>> Not much useful stuff in the logs. Every few hours, this exception pops
>> up in mapred-hadoop-historyserver.log, but I don't know if it's related.
>>
>> 2015-01-23 03:41:40,003 WARN
>> org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService: Could not process
>> job files
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>> User: hadoop is not allowed to impersonate hadoop
>> at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>> at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
>> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:219)
>> at
>> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1137)
>> at
>> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1127)
>> at
>> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1117)
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:264)
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:231)
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:224)
>> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1290)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296)
>> at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:296)
>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
>> at
>> org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.buildJobIndexInfo(KilledHistoryService.java:196)
>> at
>> org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.access$100(KilledHistoryService.java:85)
>> at
>> org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler$1.run(KilledHistoryService.java:128)
>> at
>> org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler$1.run(KilledHistoryService.java:125)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> at
>> org.apache.hadoop.mapred

Re: Home for Apache Big Data Solutions?

2015-02-09 Thread Jean-Baptiste Onofré

Hi Amir,

thanks for the update.

Please, let me know if you need some help on the proposal and to 
"qualify" your ideas.


Regards
JB

On 02/09/2015 02:05 PM, MrAsanjar . wrote:

Hi Chris,
thanks for the information, will get on it ...

Hi JB
Glad that you are familiar with Juju, however my personal goal is not to
promote any tool but
to take the next step, which is to build a community for apache big data
solutions.

 >>do you already have a kind of proposal/description of your projects ?
working on it :) I got the idea while flying back from South Africa on
Saturday. During my trip I noticed most of the communities spending
their precious resources on solution plumbing, without much of emphasis
on solution best practices due to the lack of expertise. By the time
Big Data solution framework becomes operational, funding has diminished
enough to limit solution activity (i.e data analytic payload
development). I am sure we could find
similar scenarios with  other institutions and SMB (small and
medium-size businesses) anywhere.
In the nutshell my goals are as follow:
1) Make Big Data solutions available to everyone
2) Encapsulate the best practices
3) All Orchestration tools are welcomed - Some solutions could have
hybrid tooling model
4) Enforce automated testing and quality control.
5) Share analytic payloads (i.e mapreduce apps, storm topology, Pig
scripts,...)


 >>Is it like distribution, or tooling ?
Good question, I envision to have a distribution model as it has
dependency on Apache hadoop projects distributions.

 >>What's the current license ?
Charms/Bundles are moving to Apache 2.0 license, target data 2/27.

Regards
Amir Sanjar
Big Data Solution Lead
Canonical

On Sun, Feb 8, 2015 at 10:46 AM, Mattmann, Chris A (3980)
mailto:chris.a.mattm...@jpl.nasa.gov>>
wrote:

Dear Amir,

Thank you for your interest in contributing these projects
to the ASF! Sincerely appreciate it.

My suggestion would be to look into the Apache Incubator,
which is the home for incoming projects at the ASF. The
TL;DR answer is:

1. You’ll need to create a proposal for each project
that you would like to bring in using:
http://incubator.apache.org/guides/proposal.html


2. You should put your proposal up on a public wiki
for each project:
http://wiki.apache.org/incubator/
create a new page e.g., YourProjectProposal, which would in
turn become http://wiki.apache.org/incubator/YouProjectProposal
You will need to request permissions to add the page on the
wiki

3. Recruit at least 3 IPMC/ASF members to mentor your project:
http://people.apache.org/committers-by-project.html#incubator-pmc

http://people.apache.org/committers-by-project.html#member


4. Submit your proposal for consideration at the Incubator
5. Enjoy!

Cheers and good luck.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov 
WWW: http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: "MrAsanjar ." mailto:afsan...@gmail.com>>
Reply-To: "user@hadoop.apache.org "
mailto:user@hadoop.apache.org>>
Date: Sunday, February 8, 2015 at 8:36 AM
To: "user@hadoop.apache.org "
mailto:user@hadoop.apache.org>>,
"dev-i...@bigtop.apache.org "
mailto:dev-i...@bigtop.apache.org>>
Subject: Home for Apache Big Data Solutions?

 >Hi all,
 >My name is Amir Sanjar, Big Data Solution Development Lead at
Canonical.
 >My team has been developing various Big Data solutions build on top of
 >Apache Hadoop projects (i.e. Hadoop, Hive, Pig,..) . We would like to
 >contribute these pure open source solutions
 > to the Apache community. I wish to propose creating an apache
project to
 >house all big data solutions regardless of orchestration tool
(i.e. Juju,
 >Ambari, Chef, ClouderaManager, Docker, ...), any suggestions ?
 >
 >
 >
 >
 >Regards
 >Amir Sanjar
 >Big Data Solution Lead
 >Canonical
 >
 >




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Home for Apache Big Data Solutions?

2015-02-09 Thread MrAsanjar .
Hi Chris,
thanks for the information, will get on it ...

Hi JB
Glad that you are familiar with Juju, however my personal goal is not to
promote any tool but
to take the next step, which is to build a community for apache big data
solutions.

>>do you already have a kind of proposal/description of your projects ?
working on it :) I got the idea while flying back from South Africa on
Saturday. During my trip I noticed most of the communities spending
their precious resources on solution plumbing, without much of emphasis on
solution best practices due to the lack of expertise. By the time
Big Data solution framework becomes operational, funding has diminished
enough to limit solution activity (i.e data analytic payload development).
I am sure we could find
similar scenarios with  other institutions and SMB (small and medium-size
businesses) anywhere.
In the nutshell my goals are as follow:
1) Make Big Data solutions available to everyone
2) Encapsulate the best practices
3) All Orchestration tools are welcomed - Some solutions could have hybrid
tooling model
4) Enforce automated testing and quality control.
5) Share analytic payloads (i.e mapreduce apps, storm topology, Pig
scripts,...)


>>Is it like distribution, or tooling ?
Good question, I envision to have a distribution model as it has dependency
on Apache hadoop projects distributions.

>>What's the current license ?
Charms/Bundles are moving to Apache 2.0 license, target data 2/27.

Regards
Amir Sanjar
Big Data Solution Lead
Canonical

On Sun, Feb 8, 2015 at 10:46 AM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Dear Amir,
>
> Thank you for your interest in contributing these projects
> to the ASF! Sincerely appreciate it.
>
> My suggestion would be to look into the Apache Incubator,
> which is the home for incoming projects at the ASF. The
> TL;DR answer is:
>
> 1. You’ll need to create a proposal for each project
> that you would like to bring in using:
> http://incubator.apache.org/guides/proposal.html
>
>
> 2. You should put your proposal up on a public wiki
> for each project:
> http://wiki.apache.org/incubator/
> create a new page e.g., YourProjectProposal, which would in
> turn become http://wiki.apache.org/incubator/YouProjectProposal
> You will need to request permissions to add the page on the
> wiki
>
> 3. Recruit at least 3 IPMC/ASF members to mentor your project:
> http://people.apache.org/committers-by-project.html#incubator-pmc
>
> http://people.apache.org/committers-by-project.html#member
>
>
> 4. Submit your proposal for consideration at the Incubator
> 5. Enjoy!
>
> Cheers and good luck.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: "MrAsanjar ." 
> Reply-To: "user@hadoop.apache.org" 
> Date: Sunday, February 8, 2015 at 8:36 AM
> To: "user@hadoop.apache.org" ,
> "dev-i...@bigtop.apache.org" 
> Subject: Home for Apache Big Data Solutions?
>
> >Hi all,
> >My name is Amir Sanjar, Big Data Solution Development Lead at Canonical.
> >My team has been developing various Big Data solutions build on top of
> >Apache Hadoop projects (i.e. Hadoop, Hive, Pig,..) . We would like to
> >contribute these pure open source solutions
> > to the Apache community. I wish to propose creating an apache project to
> >house all big data solutions regardless of orchestration tool (i.e. Juju,
> >Ambari, Chef, ClouderaManager, Docker, ...), any suggestions ?
> >
> >
> >
> >
> >Regards
> >Amir Sanjar
> >Big Data Solution Lead
> >Canonical
> >
> >
>
>


Re: Max Connect retries

2015-02-09 Thread Telles Nobrega
Thanks

On Mon Feb 09 2015 at 01:43:24 Xuan Gong  wrote:

>  That is for client connect retry in ipc level.
>
> You can decrease the max.retries by configuring
>
> ipc.client.connect.max.retries.on.timeouts
>
> in core-site.xml
>
>
>  Thanks
>
>  Xuan Gong
>
>   From: Telles Nobrega 
> Reply-To: "user@hadoop.apache.org" 
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org" 
> Subject: Max Connect retries
>
>   Hi, I changed my cluster config so a failed nodemanager can be detected
> in about 30 seconds. When I'm running a wordcount the reduce gets stuck in
> 25% for a quite while and logs show nodes trying to connect to the failed
> node:
>
>  org.apache.hadoop.ipc.Client: Retrying connect to server: 
> hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already 
> tried 28 time(s); maxRetries=45
> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request 
> from attempt_1423319128424_0025_r_00_0. startIndex 24 maxEvents 1
>
> Is this the expected behaviour? should I change max retries to a lower 
> values? if so, which  config is that?
>
> Thanks
>
>
>