Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Merto Mertek Wed, 08 Feb 2012 18:20:01 -0800

I will need your help. Please confirm if the following procedure is right.
I have a dev environment where I pimp my scheduler (no hadoop running) and
a small cluster environment where the changes(jars) are deployed with some
scripts,  however I have never compiled the whole hadoop from source so I
do not know if I am doing it right. I' ve done it as follow:


a) apply a patch
b) cd $HD_HOME; ant
c) copy $HD_HOME/*build*/patched-core-hadoop.jar -> cluster:/$HD_HOME/*lib*
d) run $HD_HOME/bin/start-all.sh

Is this enough? When I tried to test "hadoop dfs -ls /" I could see that a
new jar was not loaded and instead a jar from
$HD_HOME/*share*/hadoop-20.205.0.jar
was taken..
Should I copy the entire hadoop folder to all nodes and reconfigure the
entire cluster for the new build, or is enough if I configure it just on
the node where gmetad will run?






On 8 February 2012 06:33, Varun Kapoor <rez...@hortonworks.com> wrote:

> I'm so sorry, Merto - like a silly goose, I attached the 2 patches to my
> reply, and of course the mailing list did not accept the attachment.
>
> I plan on opening JIRAs for this tomorrow, but till then, here are links to
> the 2 patches (from my Dropbox account):
>
>   - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch
>   - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch
>
> Here's hoping this works for you,
>
> Varun
> On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek <masmer...@gmail.com> wrote:
>
> > Varun, have I missed your link to the patches? I have tried to search
> them
> > on jira but I did not find them.. Can you repost the link for these two
> > patches?
> >
> > Thank you..
> >
> > On 7 February 2012 20:36, Varun Kapoor <rez...@hortonworks.com> wrote:
> >
> > > I'm sorry to hear that gmetad cores continuously for you guys. Since
> I'm
> > > not seeing that behavior, I'm going to just put out the 2 possible
> > patches
> > > you could apply and wait to hear back from you. :)
> > >
> > > Option 1
> > >
> > > * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (
> > >
> >
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup)
>  in your Hadoop sources and rebuild Hadoop.
> > >
> > > Option 2
> > >
> > > * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and
> > > rebuild gmetad.
> > >
> > > Only 1 of these 2 fixes is required, and it would help me if you could
> > > first try Option 1 and let me know if that fixes things for you.
> > >
> > > Varun
> > >
> > > On Mon, Feb 6, 2012 at 10:36 PM, mete <efk...@gmail.com> wrote:
> > >
> > >> Same with Merto's situation here, it always overflows short time after
> > the
> > >> restart. Without the hadoop metrics enabled everything is smooth.
> > >> Regards
> > >>
> > >> Mete
> > >>
> > >> On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek <masmer...@gmail.com>
> > wrote:
> > >>
> > >> > I have tried to run it but it repeats crashing..
> > >> >
> > >> >  - When you start gmetad and Hadoop is not emitting metrics,
> > everything
> > >> > >   is peachy.
> > >> > >
> > >> >
> > >> > Right, running just ganglia without running hadoop jobs seems stable
> > >> for at
> > >> > least a day..
> > >> >
> > >> >
> > >> > >   - When you start Hadoop (and it thus starts emitting metrics),
> > >> gmetad
> > >> > >   cores.
> > >> > >
> > >> >
> > >> > True, with a  following error : *** stack smashing detected ***:
> > gmetad
> > >> > terminated \n Segmentation fault
> > >> >
> > >> >     - On my MacBookPro, it's a SIGABRT due to a buffer overflow.
> > >> > >
> > >> > > I believe this is happening for everyone. What I would like for
> you
> > to
> > >> > try
> > >> > > out are the following 2 scenarios:
> > >> > >
> > >> > >   - Once gmetad cores, if you start it up again, does it core
> again?
> > >> Does
> > >> > >   this process repeat ad infinitum?
> > >> > >
> > >> >     - On my MBP, the core is a one-time thing, and restarting gmetad
> > >> > >      after the first core makes things run perfectly smoothly.
> > >> > >         - I know others are saying this core occurs continuously,
> > but
> > >> > they
> > >> > >         were all using ganglia-3.1.x, and I'm interested in how
> > >> > > ganglia-3.2.0
> > >> > >         behaves for you.
> > >> > >
> > >> >
> > >> > It cores everytime I run it. The difference is just that sometimes a
> > >> > segmentation faults appears instantly, and sometimes it appears
> after
> > a
> > >> > random time...lets say after a minute of running gmetad and
> collecting
> > >> > data.
> > >> >
> > >> >
> > >> > >         - If you start Hadoop first (so gmetad is not running when
> > the
> > >> > >   first batch of Hadoop metrics are emitted) and THEN start gmetad
> > >> after
> > >> > a
> > >> > >   few seconds, do you still see gmetad coring?
> > >> > >
> > >> >
> > >> > Yes
> > >> >
> > >> >
> > >> > >      - On my MBP, this sequence works perfectly fine, and there
> are
> > no
> > >> > >      gmetad cores whatsoever.
> > >> > >
> > >> >
> > >> > I have tested this scenario with 2 working nodes so two gmond plus
> the
> > >> head
> > >> > gmond on the server where gmetad is located. I have checked and all
> of
> > >> them
> > >> > are versioned 3.2.0.
> > >> >
> > >> > Hope it helps..
> > >> >
> > >> >
> > >> >
> > >> > >
> > >> > > Bear in mind that this only addresses the gmetad coring issue -
> the
> > >> > > warnings emitted about '4.9E-324' being out of range will
> continue,
> > >> but I
> > >> > > know what's causing that as well (and hope that my patch fixes it
> > for
> > >> > > free).
> > >> > >
> > >> > > Varun
> > >> > > On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek <masmer...@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Yes I am encoutering the same problems and like Mete said  few
> > >> seconds
> > >> > > > after restarting a segmentation fault appears.. here is my
> conf..
> > >> > > > <http://pastebin.com/VgBjp08d>
> > >> > > >
> > >> > > > And here are some info from /var/log/messages (ubuntu server
> > 10.10):
> > >> > > >
> > >> > > > kernel: [424447.140641] gmetad[26115] general protection
> > >> > ip:7f7762428fdb
> > >> > > > > sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]
> > >> > > > >
> > >> > > >
> > >> > > > When I compiled gmetad I used the following command:
> > >> > > >
> > >> > > > ./configure --with-gmetad --sysconfdir=/etc/ganglia
> > >> > > > > CPPFLAGS="-I/usr/local/rrdtool-1.4.7/include"
> > >> > > > > CFLAGS="-I/usr/local/rrdtool-1.4.7/include"
> > >> > > > > LDFLAGS="-L/usr/local/rrdtool-1.4.7/lib"
> > >> > > > >
> > >> > > >
> > >> > > > The same was tried with rrdtool 1.4.5. My current ganglia
> version
> > is
> > >> > > 3.2.0
> > >> > > > and like Mete I tried it with version 3.1.7 but without
> success..
> > >> > > >
> > >> > > > Hope we will sort it out soon any solution..
> > >> > > > thank you
> > >> > > >
> > >> > > >
> > >> > > > On 6 February 2012 20:09, mete <efk...@gmail.com> wrote:
> > >> > > >
> > >> > > > > Hello,
> > >> > > > > i also face this issue when using GangliaContext31 and
> > >> hadoop-1.0.0,
> > >> > > and
> > >> > > > > ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer
> > >> overflows
> > >> > > as
> > >> > > > > soon as i restart the gmetad.
> > >> > > > > Regards
> > >> > > > > Mete
> > >> > > > >
> > >> > > > > On Mon, Feb 6, 2012 at 7:42 PM, Vitthal "Suhas" Gogate <
> > >> > > > > gog...@hortonworks.com> wrote:
> > >> > > > >
> > >> > > > > > I assume you have seen the following information on Hadoop
> > >> twiki,
> > >> > > > > > http://wiki.apache.org/hadoop/GangliaMetrics
> > >> > > > > >
> > >> > > > > > So do you use GangliaContext31 in
> hadoop-metrics2.properties?
> > >> > > > > >
> > >> > > > > > We use Ganglia 3.2 with Hadoop 20.205  and works fine (I
> > >> remember
> > >> > > > seeing
> > >> > > > > > gmetad sometime goes down due to buffer overflow problem
> when
> > >> > hadoop
> > >> > > > > starts
> > >> > > > > > pumping in the metrics.. but restarting works.. let me know
> if
> > >> you
> > >> > > face
> > >> > > > > > same problem?
> > >> > > > > >
> > >> > > > > > --Suhas
> > >> > > > > >
> > >> > > > > > Additionally, the Ganglia protocol change significantly
> > between
> > >> > > Ganglia
> > >> > > > > 3.0
> > >> > > > > > and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with
> > >> Ganglia
> > >> > 3.0
> > >> > > > > > clients). This caused Hadoop to not work with Ganglia 3.1;
> > there
> > >> > is a
> > >> > > > > patch
> > >> > > > > > available for this, HADOOP-4675. As of November 2010, this
> > patch
> > >> > has
> > >> > > > been
> > >> > > > > > rolled into the mainline for 0.20.2 and later. To use the
> > >> Ganglia
> > >> > 3.1
> > >> > > > > > protocol in place of the 3.0, substitute
> > >> > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext31 for
> > >> > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext in the
> > >> > > > > > hadoop-metrics.properties lines above.
> > >> > > > > >
> > >> > > > > > On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek <
> > >> masmer...@gmail.com>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > I spent a lot of time to figure it out however i did not
> > find
> > >> a
> > >> > > > > solution.
> > >> > > > > > > Problems from the logs pointed me for some bugs in
> rrdupdate
> > >> > tool,
> > >> > > > > > however
> > >> > > > > > > i tried to solve it with different versions of ganglia and
> > >> > rrdtool
> > >> > > > but
> > >> > > > > > the
> > >> > > > > > > error is the same. Segmentation fault appears after the
> > >> following
> > >> > > > > lines,
> > >> > > > > > if
> > >> > > > > > > I run gmetad in debug mode...
> > >> > > > > > >
> > >> > > > > > > "Created rrd
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd"
> > >> > > > > > > "Created rrd
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd
> > >> > > > > > > "
> > >> > > > > > >
> > >> > > > > > > which I suppose are generated from MetricsSystemImpl.java
> > (Is
> > >> > there
> > >> > > > any
> > >> > > > > > way
> > >> > > > > > > just to disable this two metrics?)
> > >> > > > > > >
> > >> > > > > > > From the /var/log/messages there are a lot of errors:
> > >> > > > > > >
> > >> > > > > > > "xxx gmetad[15217]: RRD_update
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
> > >> > > > > > > converting  '4.9E-324' to float: Numerical result out of
> > >> range"
> > >> > > > > > > "xxx gmetad[15217]: RRD_update
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
> > >> > > > > > > converting  '4.9E-324' to float: Numerical result out of
> > >> range"
> > >> > > > > > >
> > >> > > > > > > so probably there are some converting issues ? Where
> should
> > I
> > >> > look
> > >> > > > for
> > >> > > > > > the
> > >> > > > > > > solution? Would you rather suggest to use ganglia 3.0.x
> with
> > >> the
> > >> > > old
> > >> > > > > > > protocol and leave the version >3.1 for further releases?
> > >> > > > > > >
> > >> > > > > > > any help is realy appreciated...
> > >> > > > > > >
> > >> > > > > > > On 1 February 2012 04:04, Merto Mertek <
> masmer...@gmail.com
> > >
> > >> > > wrote:
> > >> > > > > > >
> > >> > > > > > > > I would be glad to hear that too.. I've setup the
> > following:
> > >> > > > > > > >
> > >> > > > > > > > Hadoop 0.20.205
> > >> > > > > > > > Ganglia Front  3.1.7
> > >> > > > > > > > Ganglia Back *(gmetad)* 3.1.7
> > >> > > > > > > > RRDTool <http://www.rrdtool.org/> 1.4.5. -> i had some
> > >> > troubles
> > >> > > > > > > > installing 1.4.4
> > >> > > > > > > >
> > >> > > > > > > > Ganglia works just in case hadoop is not running, so
> > metrics
> > >> > are
> > >> > > > not
> > >> > > > > > > > publshed to gmetad node (conf with new
> > >> > > > hadoop-metrics2.proprieties).
> > >> > > > > > When
> > >> > > > > > > > hadoop is started, a segmentation fault appears in
> gmetad
> > >> > deamon:
> > >> > > > > > > >
> > >> > > > > > > > sudo gmetad -d 2
> > >> > > > > > > > .......
> > >> > > > > > > > Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
> > >> > > > > > > > Updating host xxx, metric bytes_in
> > >> > > > > > > > Updating host xxx, metric bytes_out
> > >> > > > > > > > Updating host xxx, metric
> > >> > > > > metricssystem.MetricsSystem.publish_max_time
> > >> > > > > > > > Created rrd
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
> > >> > > > > > > > Segmentation fault
> > >> > > > > > > >
> > >> > > > > > > > And some info from the apache log <
> > >> > http://pastebin.com/nrqKRtKJ
> > >> > > >..
> > >> > > > > > > >
> > >> > > > > > > > Can someone suggest a ganglia version that is tested
> with
> > >> > hadoop
> > >> > > > > > > 0.20.205?
> > >> > > > > > > > I will try to sort it out however it seems a not so
> > tribial
> > >> > > > problem..
> > >> > > > > > > >
> > >> > > > > > > > Thank you
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On 2 December 2011 12:32, praveenesh kumar <
> > >> > praveen...@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > >> or Do I have to apply some hadoop patch for this ?
> > >> > > > > > > >>
> > >> > > > > > > >> Thanks,
> > >> > > > > > > >> Praveenesh
> > >> > > > > > > >>
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > >
> > > http://www.hadoopsummit.org/
> > >
> > >
> >
>
>
>
> --
>
>
> http://www.hadoopsummit.org/
>

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Reply via email to