Hadoop-on-demand and torque

2012-05-17 Thread Merto Mertek
If I understand it right HOD is mentioned mainly for merging existing HPC
clusters with hadoop and for testing purposes..

I cannot find what is the role of Torque here (just initial nodes
allocation?) and which is the default scheduler of HOD ?  Probably the
scheduler from the hadoop distribution?

In the doc is mentioned a MAUI scheduler, but probably if there would be an
integration with hadoop there will be any document on it..

thanks..


Re: Distributing MapReduce on a computer cluster

2012-04-25 Thread Merto Mertek
For distribution of load you can start reading some chapters from different
types of hadoop scheduler. I have not yet studied other implementation like
hadoop, however a very simplified version of distribution concept  is the
following:

a) Tasktracker ask for work (heartbeat consist of a status of the worker
node - # free slots)
b) Jobtracker pick a job from a list which is sorted based on the specified
policy (fairscheduling, fifo, lifo, other sla)
c) Tasktracker executes the map/reduce job

Like mentioned before there are a lot more details.. In b) there exists an
implementation of delay scheduling which is there to improve throughput by
taking account of input data location for a picked job. There you have a
preemption mechanism that regulate the fairness between pools,etc..

 A good start is book that Preshant mentioned...

On 23 April 2012 23:49, Prashant Kommireddi prash1...@gmail.com wrote:

 Shailesh, there's a lot that goes into distributing work across
 tasks/nodes. It's not just distributing work but also fault-tolerance,
 data locality etc that come into play. It might be good to refer
 Hadoop apache docs or Tom White's definitive guide.

 Sent from my iPhone

 On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala shailesh2...@gmail.com
 wrote:

  Hello,
 
  I am trying to design my own MapReduce Implementation and I want to know
  how hadoop is able to distribute its workload across multiple computers.
  Can anyone shed more light on this? thanks!



Re: Algorithms used in fairscheduler 0.20.205

2012-04-23 Thread Merto Mertek
Anyone?

On 19 April 2012 17:34, Merto Mertek masmer...@gmail.com wrote:

 I could find that the closest doc matching the current implementation of
 the fairscheduler could be find in this 
 documenthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlfrom
  Matei Zaharia et al.. Another documented from delay scheduling can be
 found from year 2010..

 a) I am interested if there maybe exist any newer documented version of
 the implementation?
 b) Are there any other algorithms in addition to delay scheduling,
 copy-compute splitting algorithm and  fairshare calculation algorithm
 that are important for the cluster performance and fairsharing?
 c) Is there maybe any connection between copy-compute splitting and
 mapreduce phases (copy-sort-reduce)?

 Thank you..



Algorithms used in fairscheduler 0.20.205

2012-04-19 Thread Merto Mertek
I could find that the closest doc matching the current implementation of
the fairscheduler could be find in this
documenthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlfrom
Matei Zaharia et al.. Another documented from delay scheduling can be
found from year 2010..

a) I am interested if there maybe exist any newer documented version of the
implementation?
b) Are there any other algorithms in addition to delay scheduling,
copy-compute splitting algorithm and  fairshare calculation algorithm
that are important for the cluster performance and fairsharing?
c) Is there maybe any connection between copy-compute splitting and
mapreduce phases (copy-sort-reduce)?

Thank you..


Fairscheduler - disable default pool

2012-03-13 Thread Merto Mertek
I know that by design all unmarked jobs goes to that pool, however I am
doing some testing and I am interested if is possible to disable it..

Thanks


Re: Fairscheduler - disable default pool

2012-03-13 Thread Merto Mertek
Thanks for your workaround, but I think that with this you just put a
constraint on the pool that it will not accept any job. I am doing some
calculation with weights and do not want the default pool weight to be
included in the computation..


On 13 March 2012 18:52, Jean-Daniel Cryans jdcry...@apache.org wrote:

 We do it here by setting this:

 poolMaxJobsDefault0/poolMaxJobsDefault

 So that you _must_ have a pool (that's configured with a different
 maxRunningJobs) in order to run jobs.

 Hope this helps,

 J-D

 On Tue, Mar 13, 2012 at 10:49 AM, Merto Mertek masmer...@gmail.com
 wrote:
  I know that by design all unmarked jobs goes to that pool, however I am
  doing some testing and I am interested if is possible to disable it..
 
  Thanks



Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Merto Mertek
From the fairscheduler docs I assume the following should work:

property
 namemapred.fairscheduler.poolnameproperty/name
   valuepool.name/value
/property

property
  namepool.name/name
  value${mapreduce.job.group.name}/value
/property

which means that the default pool will be the group of the user that has
submitted the job. In your case I think that allocations.xml is correct. If
you want to explicitly define a job to specific pool from your
allocation.xml file you can define it as follows:

Configuration conf3 = conf;
conf3.set(pool.name, pool3); // conf.set(propriety.name, value)

Let me know if it works..


On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote:

 How can I set the fair scheduler such that all jobs submitted from a
 particular user group go to a pool with the group name?

 I have setup fair scheduler and I have two users: A and B (belonging to the
 user group hadoop)

 When these users submit hadoop jobs, the jobs from A got to a pool named A
 and the jobs from B go to a pool named B.
  I want them to go to a pool with their group name, So I tried adding the
 following to mapred-site.xml:

 property
  namemapred.fairscheduler.poolnameproperty/name
 valuegroup.name/value
 /property

 But instead the jobs now go to the default pool.
 I want the jobs submitted by A and B to go to the pool named hadoop. How
 do I do that?
 also how can I explicity set a job to any specified pool?

 I have set the allocation file (fair-scheduler.xml) like this:

 allocations
  pool name=hadoop
minMaps1/minMaps
minReduces1/minReduces
maxMaps3/maxMaps
maxReduces3/maxReduces
  /pool
  userMaxJobsDefault5/userMaxJobsDefault
 /allocations

 Any help is greatly appreciated.
 Thanks,
 Austin



Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-29 Thread Merto Mertek
Varun sorry for my late response. Today I have deployed a new version and I
can confirm that patches you provided works well. I' ve been running some
jobs on a 5node cluster for an hour without a core on full load so now
thinks works as expected.

Thank you again!

I have used just your first option..

On 15 February 2012 19:53, mete efk...@gmail.com wrote:

 Well rebuilding ganglia seemed easier and Merto was testing the other so i
 though that i should give that one a chance :)
 anyway i will send you gdb details or patch hadoop and try it at my
 earliest convenience

 Cheers

 On Wed, Feb 15, 2012 at 6:59 PM, Varun Kapoor rez...@hortonworks.com
 wrote:

  The warnings about underflow are totally expected (they come from
 strtod(),
  and they will no longer occur with Hadoop-1.0.1, which applies my patch
  from HADOOP-8052), so that's not worrisome.
 
  As for the buffer overflow, do you think you could show me a backtrace of
  this core? If you can't find the core file on disk, just start gmetad
 under
  gdb, like so:
 
  $ sudo gdb path to gmetad
 
  (gdb) r --conf=path to your gmetad.conf
  ...
  ::Wait for crash::
  (gdb) bt
  (gdb) info locals
 
  If you're familiar with gdb, then I'd appreciate any additional diagnosis
  you could perform (for example, to figure out which metric's value caused
  this buffer overflow) - if you're not, I'll try and send you some gdb
  scripts to narrow things down once I see the output from this round of
  debugging.
 
  Also, out of curiosity, is patching Hadoop not an option for you? Or is
 it
  just that rebuilding (and redeploying) ganglia is the lesser of the 2
  evils? :)
 
  Varun
 
  On Tue, Feb 14, 2012 at 11:43 PM, mete efk...@gmail.com wrote:
 
   Hello Varun,
   i have patched and recompiled ganglia from source bit it still cores
  after
   the patch.
  
   Here are some logs:
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: *** buffer overflow detected ***:
   gmetad terminated
  
   i am using hadoop.1.0.0 and ganglia 3.20 tarball.
  
   Cheers
   Mete
  
   On Sat, Feb 11, 2012 at 2:19 AM, Merto Mertek masmer...@gmail.com
  wrote:
  
Varun unfortunately I have had some problems with deploying a new
  version
on the cluster.. Hadoop is not picking the new build in lib folder
   despite
a classpath is set to it. The new build is picked just if I put it in
  the
$HD_HOME/share/hadoop/, which is very strange.. I've done this on all
   nodes
and can access the web, but all tasktracker are being stopped because
  of
   an
error:
   
INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
   Cleanup...
 java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at

   
  
 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)

   
   
Probably the error is the consequence of an inadequate deploy of a
  jar..
   I
will ask to the dev list how they do it or are you maybe having any
  other
idea?
   
   
   
On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com
  wrote:
   
 Hey Merto,

 Any luck getting the patch running on your cluster?

 In case you're interested, there's now a JIRA for this:
 https://issues.apache.org/jira/browse/HADOOP-8052.

 Varun

 On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor 
 rez

Re: Tasktracker fails

2012-02-22 Thread Merto Mertek
Hm.. I would try first to stop all the deamons wtih
$haddop_home/bin/stop-all.sh. Afterwards check that on the master and one
of the slaves no deamons are running (jps). Maybe you could try to check if
your conf on tasktrackers for the jobtracker is pointing to the right place
(mapred-site.xml). Do you see any error in the jobtracker log too?


On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote:

 Any update on the below issue.

 Thanks


 Adarsh Sharma wrote:

 Dear all,

 Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
 When I start my cluster , all daemons got started except tasktracker,
 don't know why task tracker fails due to following error logs.

 Cluster is in private network.My /etc/hosts file contains all IP hostname
 resolution commands in all  nodes.

 2012-02-21 17:48:33,056 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
 MBean for source TaskTrackerMetrics registered.
 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker:
 Can not start task tracker because java.net.SocketException: Invalid
 argument
   at sun.nio.ch.Net.bind(Native Method)
   at sun.nio.ch.**ServerSocketChannelImpl.bind(**
 ServerSocketChannelImpl.java:**119)
   at sun.nio.ch.**ServerSocketAdaptor.bind(**
 ServerSocketAdaptor.java:59)
   at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**
 301)
   at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
   at org.apache.hadoop.mapred.**TaskTracker.initialize(**
 TaskTracker.java:772)
   at org.apache.hadoop.mapred.**TaskTracker.init(**
 TaskTracker.java:1428)
   at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
 java:3673)

 Any comments on the issue.


 Thanks





Re: Tasktracker fails

2012-02-22 Thread Merto Mertek
I do not know how the distribution and splitting of deflate files exactly
works if that is your question but probably you will find something useful
in *Codec classes, where are located implementations of few compression
formats. Deflate files are just a type of compression files that you can
use for storing files in your system. There are several others types,
depending on your needs and tradeofs you are dealing (space or time for
compressing).

 Globs I think are just a matching strategy to match files/folders together
with regular expressions..


On 22 February 2012 19:29, Jay Vyas jayunit...@gmail.com wrote:

 Hi guys !

 Im trying to understand the way globstatus / deflate files work in hdfs.  I
 cant read them using the globStatus API in the hadoop FileSystem , from
 java.  the specifics are here if anyone wants some easy stackoverflow
 points :)


 http://stackoverflow.com/questions/9400739/hadoop-globstatus-and-deflate-files

 On Wed, Feb 22, 2012 at 7:39 AM, Merto Mertek masmer...@gmail.com wrote:

  Hm.. I would try first to stop all the deamons wtih
  $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one
  of the slaves no deamons are running (jps). Maybe you could try to check
 if
  your conf on tasktrackers for the jobtracker is pointing to the right
 place
  (mapred-site.xml). Do you see any error in the jobtracker log too?
 
 
  On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com
 wrote:
 
   Any update on the below issue.
  
   Thanks
  
  
   Adarsh Sharma wrote:
  
   Dear all,
  
   Today I am trying  to configure hadoop-0.20.205.0 on a 4  node
 Cluster.
   When I start my cluster , all daemons got started except tasktracker,
   don't know why task tracker fails due to following error logs.
  
   Cluster is in private network.My /etc/hosts file contains all IP
  hostname
   resolution commands in all  nodes.
  
   2012-02-21 17:48:33,056 INFO
  org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
   MBean for source TaskTrackerMetrics registered.
   2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker:
   Can not start task tracker because java.net.SocketException: Invalid
   argument
 at sun.nio.ch.Net.bind(Native Method)
 at sun.nio.ch.**ServerSocketChannelImpl.bind(**
   ServerSocketChannelImpl.java:**119)
 at sun.nio.ch.**ServerSocketAdaptor.bind(**
   ServerSocketAdaptor.java:59)
 at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
 at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**
   301)
 at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
 at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
 at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
 at org.apache.hadoop.mapred.**TaskTracker.initialize(**
   TaskTracker.java:772)
 at org.apache.hadoop.mapred.**TaskTracker.init(**
   TaskTracker.java:1428)
 at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
   java:3673)
  
   Any comments on the issue.
  
  
   Thanks
  
  
  
 



 --
 Jay Vyas
 MMSB/UCHC



Re: Dynamic changing of slaves

2012-02-21 Thread Merto Mertek
I think that job configuration does not allow you such setup, however maybe
I missed something..

 Probably I would tackle this problem from the scheduler source. The
default one is JobQueueTaskScheduler which preserves a fifo based queue.
When a tasktracker (your slave) tells the jobtracker that it has some free
slots to run, JT in the heartbeat method calls the scheduler assignTasks
method where tasks are assigned on local basis. In other words, scheduler
tries to find tasks on the tasktracker which data resides on it. If the
scheduler will not find a local map/reduce task to run it will try to find
a non local one. Probably here is the point where you should do something
with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting
for the TT heartbeat, maybe there is another option to force an
heartbeatResponse, despite the TT has not send a heartbeat but I am not
aware of it..


On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote:


 Hi,

 I am working on a project which requires a setup as follows:

 One master with four slaves.However, when a map only program is run, the
 master dynamically selects the slave to run the map. For example, when the
 program is run for the first time, slave 2 is selected to run the map and
 reduce programs, and the output is stored on dfs. When the program is run
 the second time, slave 3 is selected and son on.

 I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

 Any ideas on creating the setup as described above?

 Regards

 --
 View this message in context:
 http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-10 Thread Merto Mertek
Varun unfortunately I have had some problems with deploying a new version
on the cluster.. Hadoop is not picking the new build in lib folder despite
a classpath is set to it. The new build is picked just if I put it in the
$HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes
and can access the web, but all tasktracker are being stopped because of an
error:

INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...
 java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at
 org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)



Probably the error is the consequence of an inadequate deploy of a jar.. I
will ask to the dev list how they do it or are you maybe having any other
idea?



On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com wrote:

 Hey Merto,

 Any luck getting the patch running on your cluster?

 In case you're interested, there's now a JIRA for this:
 https://issues.apache.org/jira/browse/HADOOP-8052.

 Varun

 On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor rez...@hortonworks.com
 wrote:

  Your general procedure sounds correct (i.e. dropping your newly built
 .jar
  into $HD_HOME/lib/), but to make sure it's getting picked up, you should
  explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH
 environment
  variable; here's mine, as an example:
 
  export HADOOP_CLASSPATH=.:./build/*.jar
 
  About your second point, you certainly need to copy this newly patched
  .jar to every node in your cluster, because my patch changes the value
 of a
  couple metrics emitted TO gmetad (FROM all the nodes in the cluster), so
  without copying it over to every node in the cluster, gmetad will still
  likely receive some bad metrics.
 
  Varun
 
 
  On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek masmer...@gmail.com
 wrote:
 
  I will need your help. Please confirm if the following procedure is
 right.
  I have a dev environment where I pimp my scheduler (no hadoop running)
 and
  a small cluster environment where the changes(jars) are deployed with
 some
  scripts,  however I have never compiled the whole hadoop from source so
 I
  do not know if I am doing it right. I' ve done it as follow:
 
  a) apply a patch
  b) cd $HD_HOME; ant
  c) copy $HD_HOME/*build*/patched-core-hadoop.jar -
  cluster:/$HD_HOME/*lib*
  d) run $HD_HOME/bin/start-all.sh
 
  Is this enough? When I tried to test hadoop dfs -ls / I could see
 that a
  new jar was not loaded and instead a jar from
  $HD_HOME/*share*/hadoop-20.205.0.jar
  was taken..
  Should I copy the entire hadoop folder to all nodes and reconfigure the
  entire cluster for the new build, or is enough if I configure it just on
  the node where gmetad will run?
 
 
 
 
 
 
  On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com wrote:
 
   I'm so sorry, Merto - like a silly goose, I attached the 2 patches to
 my
   reply, and of course the mailing list did not accept the attachment.
  
   I plan on opening JIRAs for this tomorrow, but till then, here are
  links to
   the 2 patches (from my Dropbox account):
  
 - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch
 - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch
  
   Here's hoping this works for you,
  
   Varun
   On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek masmer...@gmail.com
  wrote:
  
Varun, have I missed your link to the patches? I have tried to
 search
   them
on jira but I did not find them.. Can you repost the link for these
  two
patches?
   
Thank you..
   
On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com
  wrote:
   
 I'm sorry to hear that gmetad cores continuously for you guys.
 Since
   I'm
 not seeing that behavior, I'm going to just put out the 2 possible
patches
 you could apply and wait to hear back from you. :)

 Option 1

 * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (

   
  
 
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup
 )
  in your Hadoop sources and rebuild Hadoop.

 Option 2

 * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c
  and
 rebuild gmetad.

 Only 1 of these 2 fixes is required, and it would help me if you
  could
 first try Option 1 and let me know if that fixes things for you.

 Varun

 On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote:

 Same with Merto's situation here, it always overflows short time
  after
the
 restart. Without the hadoop metrics enabled everything is smooth.
 Regards

 Mete

 On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek 
 masmer...@gmail.com
wrote:

  I have tried to run it but it repeats crashing..
 
   - When you start gmetad

Compile and deploy a new version of hadoop

2012-02-09 Thread Merto Mertek
I am having some troubles in understanding how the whole stuff works..

Compiling with ant works ok and I am able to compile a jar which is
afterwards deployed to the cluster. On the cluster I've set the
HADOOP_CLASSPATH variable to point just to jar files in the lib folder
($HD_HOME/lib/*.jar), where I put the new compiled
hadoop-core-myversion.jar.

Before deploying I guarantee that in the $HD_HOME folder and $HD_HOME/lib
there are no previous version of hadoop-core-xxx.jar or core-3.3.1.jar .
The problem is that I suspect that hadoop is picking the wrong hadoop-core
jars so I am interested how the whole mechanism works and what is the
purpose of the $HD_HOME/share/hadoop folder where I can locate other
hadoop-core jars and which is included in the classpath in hadoop-env.sh?


My last question is what is the easiest way to see that your build is up
and running?  Maybe from the release tag in JT?

Thanks you..


Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-08 Thread Merto Mertek
I will need your help. Please confirm if the following procedure is right.
I have a dev environment where I pimp my scheduler (no hadoop running) and
a small cluster environment where the changes(jars) are deployed with some
scripts,  however I have never compiled the whole hadoop from source so I
do not know if I am doing it right. I' ve done it as follow:

a) apply a patch
b) cd $HD_HOME; ant
c) copy $HD_HOME/*build*/patched-core-hadoop.jar - cluster:/$HD_HOME/*lib*
d) run $HD_HOME/bin/start-all.sh

Is this enough? When I tried to test hadoop dfs -ls / I could see that a
new jar was not loaded and instead a jar from
$HD_HOME/*share*/hadoop-20.205.0.jar
was taken..
Should I copy the entire hadoop folder to all nodes and reconfigure the
entire cluster for the new build, or is enough if I configure it just on
the node where gmetad will run?






On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com wrote:

 I'm so sorry, Merto - like a silly goose, I attached the 2 patches to my
 reply, and of course the mailing list did not accept the attachment.

 I plan on opening JIRAs for this tomorrow, but till then, here are links to
 the 2 patches (from my Dropbox account):

   - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch
   - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch

 Here's hoping this works for you,

 Varun
 On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek masmer...@gmail.com wrote:

  Varun, have I missed your link to the patches? I have tried to search
 them
  on jira but I did not find them.. Can you repost the link for these two
  patches?
 
  Thank you..
 
  On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote:
 
   I'm sorry to hear that gmetad cores continuously for you guys. Since
 I'm
   not seeing that behavior, I'm going to just put out the 2 possible
  patches
   you could apply and wait to hear back from you. :)
  
   Option 1
  
   * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (
  
 
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup)
  in your Hadoop sources and rebuild Hadoop.
  
   Option 2
  
   * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and
   rebuild gmetad.
  
   Only 1 of these 2 fixes is required, and it would help me if you could
   first try Option 1 and let me know if that fixes things for you.
  
   Varun
  
   On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote:
  
   Same with Merto's situation here, it always overflows short time after
  the
   restart. Without the hadoop metrics enabled everything is smooth.
   Regards
  
   Mete
  
   On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com
  wrote:
  
I have tried to run it but it repeats crashing..
   
 - When you start gmetad and Hadoop is not emitting metrics,
  everything
   is peachy.

   
Right, running just ganglia without running hadoop jobs seems stable
   for at
least a day..
   
   
   - When you start Hadoop (and it thus starts emitting metrics),
   gmetad
   cores.

   
True, with a  following error : *** stack smashing detected ***:
  gmetad
terminated \n Segmentation fault
   
- On my MacBookPro, it's a SIGABRT due to a buffer overflow.

 I believe this is happening for everyone. What I would like for
 you
  to
try
 out are the following 2 scenarios:

   - Once gmetad cores, if you start it up again, does it core
 again?
   Does
   this process repeat ad infinitum?

- On my MBP, the core is a one-time thing, and restarting gmetad
  after the first core makes things run perfectly smoothly.
 - I know others are saying this core occurs continuously,
  but
they
 were all using ganglia-3.1.x, and I'm interested in how
 ganglia-3.2.0
 behaves for you.

   
It cores everytime I run it. The difference is just that sometimes a
segmentation faults appears instantly, and sometimes it appears
 after
  a
random time...lets say after a minute of running gmetad and
 collecting
data.
   
   
 - If you start Hadoop first (so gmetad is not running when
  the
   first batch of Hadoop metrics are emitted) and THEN start gmetad
   after
a
   few seconds, do you still see gmetad coring?

   
Yes
   
   
  - On my MBP, this sequence works perfectly fine, and there
 are
  no
  gmetad cores whatsoever.

   
I have tested this scenario with 2 working nodes so two gmond plus
 the
   head
gmond on the server where gmetad is located. I have checked and all
 of
   them
are versioned 3.2.0.
   
Hope it helps..
   
   
   

 Bear in mind that this only addresses the gmetad coring issue -
 the
 warnings emitted about '4.9E-324' being out of range will
 continue,
   but I
 know what's causing

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-07 Thread Merto Mertek
Varun, have I missed your link to the patches? I have tried to search them
on jira but I did not find them.. Can you repost the link for these two
patches?

Thank you..

On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote:

 I'm sorry to hear that gmetad cores continuously for you guys. Since I'm
 not seeing that behavior, I'm going to just put out the 2 possible patches
 you could apply and wait to hear back from you. :)

 Option 1

 * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupin
  my setup) in your Hadoop sources and rebuild Hadoop.

 Option 2

 * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and
 rebuild gmetad.

 Only 1 of these 2 fixes is required, and it would help me if you could
 first try Option 1 and let me know if that fixes things for you.

 Varun

 On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote:

 Same with Merto's situation here, it always overflows short time after the
 restart. Without the hadoop metrics enabled everything is smooth.
 Regards

 Mete

 On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com wrote:

  I have tried to run it but it repeats crashing..
 
   - When you start gmetad and Hadoop is not emitting metrics, everything
 is peachy.
  
 
  Right, running just ganglia without running hadoop jobs seems stable
 for at
  least a day..
 
 
 - When you start Hadoop (and it thus starts emitting metrics),
 gmetad
 cores.
  
 
  True, with a  following error : *** stack smashing detected ***: gmetad
  terminated \n Segmentation fault
 
  - On my MacBookPro, it's a SIGABRT due to a buffer overflow.
  
   I believe this is happening for everyone. What I would like for you to
  try
   out are the following 2 scenarios:
  
 - Once gmetad cores, if you start it up again, does it core again?
 Does
 this process repeat ad infinitum?
  
  - On my MBP, the core is a one-time thing, and restarting gmetad
after the first core makes things run perfectly smoothly.
   - I know others are saying this core occurs continuously, but
  they
   were all using ganglia-3.1.x, and I'm interested in how
   ganglia-3.2.0
   behaves for you.
  
 
  It cores everytime I run it. The difference is just that sometimes a
  segmentation faults appears instantly, and sometimes it appears after a
  random time...lets say after a minute of running gmetad and collecting
  data.
 
 
   - If you start Hadoop first (so gmetad is not running when the
 first batch of Hadoop metrics are emitted) and THEN start gmetad
 after
  a
 few seconds, do you still see gmetad coring?
  
 
  Yes
 
 
- On my MBP, this sequence works perfectly fine, and there are no
gmetad cores whatsoever.
  
 
  I have tested this scenario with 2 working nodes so two gmond plus the
 head
  gmond on the server where gmetad is located. I have checked and all of
 them
  are versioned 3.2.0.
 
  Hope it helps..
 
 
 
  
   Bear in mind that this only addresses the gmetad coring issue - the
   warnings emitted about '4.9E-324' being out of range will continue,
 but I
   know what's causing that as well (and hope that my patch fixes it for
   free).
  
   Varun
   On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek masmer...@gmail.com
  wrote:
  
Yes I am encoutering the same problems and like Mete said  few
 seconds
after restarting a segmentation fault appears.. here is my conf..
http://pastebin.com/VgBjp08d
   
And here are some info from /var/log/messages (ubuntu server 10.10):
   
kernel: [424447.140641] gmetad[26115] general protection
  ip:7f7762428fdb
 sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]

   
When I compiled gmetad I used the following command:
   
./configure --with-gmetad --sysconfdir=/etc/ganglia
 CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include
 CFLAGS=-I/usr/local/rrdtool-1.4.7/include
 LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib

   
The same was tried with rrdtool 1.4.5. My current ganglia version is
   3.2.0
and like Mete I tried it with version 3.1.7 but without success..
   
Hope we will sort it out soon any solution..
thank you
   
   
On 6 February 2012 20:09, mete efk...@gmail.com wrote:
   
 Hello,
 i also face this issue when using GangliaContext31 and
 hadoop-1.0.0,
   and
 ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer
 overflows
   as
 soon as i restart the gmetad.
 Regards
 Mete

 On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate 
 gog...@hortonworks.com wrote:

  I assume you have seen the following information on Hadoop
 twiki,
  http://wiki.apache.org/hadoop/GangliaMetrics
 
  So do you use GangliaContext31 in hadoop-metrics2.properties?
 
  We use Ganglia 3.2 with Hadoop

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Merto Mertek
Yes I am encoutering the same problems and like Mete said  few seconds
after restarting a segmentation fault appears.. here is my conf..
http://pastebin.com/VgBjp08d

And here are some info from /var/log/messages (ubuntu server 10.10):

kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb
 sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]


When I compiled gmetad I used the following command:

./configure --with-gmetad --sysconfdir=/etc/ganglia
 CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include
 CFLAGS=-I/usr/local/rrdtool-1.4.7/include
 LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib


The same was tried with rrdtool 1.4.5. My current ganglia version is 3.2.0
and like Mete I tried it with version 3.1.7 but without success..

Hope we will sort it out soon any solution..
thank you


On 6 February 2012 20:09, mete efk...@gmail.com wrote:

 Hello,
 i also face this issue when using GangliaContext31 and hadoop-1.0.0, and
 ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as
 soon as i restart the gmetad.
 Regards
 Mete

 On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate 
 gog...@hortonworks.com wrote:

  I assume you have seen the following information on Hadoop twiki,
  http://wiki.apache.org/hadoop/GangliaMetrics
 
  So do you use GangliaContext31 in hadoop-metrics2.properties?
 
  We use Ganglia 3.2 with Hadoop 20.205  and works fine (I remember seeing
  gmetad sometime goes down due to buffer overflow problem when hadoop
 starts
  pumping in the metrics.. but restarting works.. let me know if you face
  same problem?
 
  --Suhas
 
  Additionally, the Ganglia protocol change significantly between Ganglia
 3.0
  and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0
  clients). This caused Hadoop to not work with Ganglia 3.1; there is a
 patch
  available for this, HADOOP-4675. As of November 2010, this patch has been
  rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1
  protocol in place of the 3.0, substitute
  org.apache.hadoop.metrics.ganglia.GangliaContext31 for
  org.apache.hadoop.metrics.ganglia.GangliaContext in the
  hadoop-metrics.properties lines above.
 
  On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek masmer...@gmail.com
 wrote:
 
   I spent a lot of time to figure it out however i did not find a
 solution.
   Problems from the logs pointed me for some bugs in rrdupdate tool,
  however
   i tried to solve it with different versions of ganglia and rrdtool but
  the
   error is the same. Segmentation fault appears after the following
 lines,
  if
   I run gmetad in debug mode...
  
   Created rrd
  
  
 
 /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd
   Created rrd
  
  
 
 /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd
   
  
   which I suppose are generated from MetricsSystemImpl.java (Is there any
  way
   just to disable this two metrics?)
  
   From the /var/log/messages there are a lot of errors:
  
   xxx gmetad[15217]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
   converting  '4.9E-324' to float: Numerical result out of range
   xxx gmetad[15217]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
   converting  '4.9E-324' to float: Numerical result out of range
  
   so probably there are some converting issues ? Where should I look for
  the
   solution? Would you rather suggest to use ganglia 3.0.x with the old
   protocol and leave the version 3.1 for further releases?
  
   any help is realy appreciated...
  
   On 1 February 2012 04:04, Merto Mertek masmer...@gmail.com wrote:
  
I would be glad to hear that too.. I've setup the following:
   
Hadoop 0.20.205
Ganglia Front  3.1.7
Ganglia Back *(gmetad)* 3.1.7
RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles
installing 1.4.4
   
Ganglia works just in case hadoop is not running, so metrics are not
publshed to gmetad node (conf with new hadoop-metrics2.proprieties).
  When
hadoop is started, a segmentation fault appears in gmetad deamon:
   
sudo gmetad -d 2
...
Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
Updating host xxx, metric bytes_in
Updating host xxx, metric bytes_out
Updating host xxx, metric
 metricssystem.MetricsSystem.publish_max_time
Created rrd
   
  
 
 /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
Segmentation fault
   
And some info from the apache log http://pastebin.com/nrqKRtKJ..
   
Can someone suggest a ganglia version that is tested with hadoop
   0.20.205?
I will try to sort it out however it seems a not so tribial problem..
   
Thank you
   
   
   
   
   
On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com
  wrote:
   
or Do I have to apply some hadoop patch

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Merto Mertek
I have tried to run it but it repeats crashing..

  - When you start gmetad and Hadoop is not emitting metrics, everything
   is peachy.


Right, running just ganglia without running hadoop jobs seems stable for at
least a day..


   - When you start Hadoop (and it thus starts emitting metrics), gmetad
   cores.


True, with a  following error : *** stack smashing detected ***: gmetad
terminated \n Segmentation fault

 - On my MacBookPro, it's a SIGABRT due to a buffer overflow.

 I believe this is happening for everyone. What I would like for you to try
 out are the following 2 scenarios:

   - Once gmetad cores, if you start it up again, does it core again? Does
   this process repeat ad infinitum?

 - On my MBP, the core is a one-time thing, and restarting gmetad
  after the first core makes things run perfectly smoothly.
 - I know others are saying this core occurs continuously, but they
 were all using ganglia-3.1.x, and I'm interested in how
 ganglia-3.2.0
 behaves for you.


It cores everytime I run it. The difference is just that sometimes a
segmentation faults appears instantly, and sometimes it appears after a
random time...lets say after a minute of running gmetad and collecting data.


 - If you start Hadoop first (so gmetad is not running when the
   first batch of Hadoop metrics are emitted) and THEN start gmetad after a
   few seconds, do you still see gmetad coring?


Yes


  - On my MBP, this sequence works perfectly fine, and there are no
  gmetad cores whatsoever.


I have tested this scenario with 2 working nodes so two gmond plus the head
gmond on the server where gmetad is located. I have checked and all of them
are versioned 3.2.0.

Hope it helps..




 Bear in mind that this only addresses the gmetad coring issue - the
 warnings emitted about '4.9E-324' being out of range will continue, but I
 know what's causing that as well (and hope that my patch fixes it for
 free).

 Varun
 On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek masmer...@gmail.com wrote:

  Yes I am encoutering the same problems and like Mete said  few seconds
  after restarting a segmentation fault appears.. here is my conf..
  http://pastebin.com/VgBjp08d
 
  And here are some info from /var/log/messages (ubuntu server 10.10):
 
  kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb
   sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]
  
 
  When I compiled gmetad I used the following command:
 
  ./configure --with-gmetad --sysconfdir=/etc/ganglia
   CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include
   CFLAGS=-I/usr/local/rrdtool-1.4.7/include
   LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib
  
 
  The same was tried with rrdtool 1.4.5. My current ganglia version is
 3.2.0
  and like Mete I tried it with version 3.1.7 but without success..
 
  Hope we will sort it out soon any solution..
  thank you
 
 
  On 6 February 2012 20:09, mete efk...@gmail.com wrote:
 
   Hello,
   i also face this issue when using GangliaContext31 and hadoop-1.0.0,
 and
   ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows
 as
   soon as i restart the gmetad.
   Regards
   Mete
  
   On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate 
   gog...@hortonworks.com wrote:
  
I assume you have seen the following information on Hadoop twiki,
http://wiki.apache.org/hadoop/GangliaMetrics
   
So do you use GangliaContext31 in hadoop-metrics2.properties?
   
We use Ganglia 3.2 with Hadoop 20.205  and works fine (I remember
  seeing
gmetad sometime goes down due to buffer overflow problem when hadoop
   starts
pumping in the metrics.. but restarting works.. let me know if you
 face
same problem?
   
--Suhas
   
Additionally, the Ganglia protocol change significantly between
 Ganglia
   3.0
and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0
clients). This caused Hadoop to not work with Ganglia 3.1; there is a
   patch
available for this, HADOOP-4675. As of November 2010, this patch has
  been
rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1
protocol in place of the 3.0, substitute
org.apache.hadoop.metrics.ganglia.GangliaContext31 for
org.apache.hadoop.metrics.ganglia.GangliaContext in the
hadoop-metrics.properties lines above.
   
On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek masmer...@gmail.com
   wrote:
   
 I spent a lot of time to figure it out however i did not find a
   solution.
 Problems from the logs pointed me for some bugs in rrdupdate tool,
however
 i tried to solve it with different versions of ganglia and rrdtool
  but
the
 error is the same. Segmentation fault appears after the following
   lines,
if
 I run gmetad in debug mode...

 Created rrd


   
  
 
 /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd
 Created rrd


   
  
 
 /var/lib/ganglia

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-03 Thread Merto Mertek
I spent a lot of time to figure it out however i did not find a solution.
Problems from the logs pointed me for some bugs in rrdupdate tool, however
i tried to solve it with different versions of ganglia and rrdtool but the
error is the same. Segmentation fault appears after the following lines, if
I run gmetad in debug mode...

Created rrd
/var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd
Created rrd
/var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd


which I suppose are generated from MetricsSystemImpl.java (Is there any way
just to disable this two metrics?)

From the /var/log/messages there are a lot of errors:

xxx gmetad[15217]: RRD_update
(/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
converting  '4.9E-324' to float: Numerical result out of range
xxx gmetad[15217]: RRD_update
(/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
converting  '4.9E-324' to float: Numerical result out of range

so probably there are some converting issues ? Where should I look for the
solution? Would you rather suggest to use ganglia 3.0.x with the old
protocol and leave the version 3.1 for further releases?

any help is realy appreciated...

On 1 February 2012 04:04, Merto Mertek masmer...@gmail.com wrote:

 I would be glad to hear that too.. I've setup the following:

 Hadoop 0.20.205
 Ganglia Front  3.1.7
 Ganglia Back *(gmetad)* 3.1.7
 RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles
 installing 1.4.4

 Ganglia works just in case hadoop is not running, so metrics are not
 publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When
 hadoop is started, a segmentation fault appears in gmetad deamon:

 sudo gmetad -d 2
 ...
 Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
 Updating host xxx, metric bytes_in
 Updating host xxx, metric bytes_out
 Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time
 Created rrd
 /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
 Segmentation fault

 And some info from the apache log http://pastebin.com/nrqKRtKJ..

 Can someone suggest a ganglia version that is tested with hadoop 0.20.205?
 I will try to sort it out however it seems a not so tribial problem..

 Thank you





 On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote:

 or Do I have to apply some hadoop patch for this ?

 Thanks,
 Praveenesh





Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-01-31 Thread Merto Mertek
I would be glad to hear that too.. I've setup the following:

Hadoop 0.20.205
Ganglia Front  3.1.7
Ganglia Back *(gmetad)* 3.1.7
RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles installing
1.4.4

Ganglia works just in case hadoop is not running, so metrics are not
publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When
hadoop is started, a segmentation fault appears in gmetad deamon:

sudo gmetad -d 2
...
Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
Updating host xxx, metric bytes_in
Updating host xxx, metric bytes_out
Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time
Created rrd
/var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
Segmentation fault

And some info from the apache log http://pastebin.com/nrqKRtKJ..

Can someone suggest a ganglia version that is tested with hadoop 0.20.205?
I will try to sort it out however it seems a not so tribial problem..

Thank you





On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote:

 or Do I have to apply some hadoop patch for this ?

 Thanks,
 Praveenesh



Configure hadoop scheduler

2011-12-20 Thread Merto Mertek
Hi,

I am having problems with changing the default hadoop scheduler (i assume
that the default scheduler is a FIFO scheduler).

I am following the guide located in hadoop/docs directory however I am not
able to run it.  Link for scheduling administration returns an http error
404 ( http://localhost:50030/scheduler ). In the UI under scheduling
information I can see only one queue named default. mapred-site.xml file
is accessible because when changing a port for a jobtracker I can see a
daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added
to .bashrc, however that did not solve the problem. I tried to rebuild
hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the
hadoop classpath in hadoop-env.sh to point to the lib folder, but without
success. The only info of the scheduler that is seen in the jobtracker log
is the folowing info:

Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)


I am working on this several days and running out of ideas... I am
wondering how to fix it and where to check currently active scheduler
parameters?

Config files:
mapred-site.xml http://pastebin.com/HmDfWqE1
allocation.xml http://pastebin.com/Uexq7uHV
Tried versions: 0.20.203 and 204

Thank you


Re: Desperate!!!! Expanding,shrinking cluster or replacing failed nodes.

2011-12-20 Thread Merto Mertek
I followed the same tutorial as you. If I am not wrong the problem arise
because you first tried to run a node as single node and then joining it to
the cluster (like Arpit mentioned). After testing that the new node works
ok try to delete content in directory /app/hadoop/tmp/ and insert a new
node to the cluster.When I setup config files on the new node I followed
the following procedure:

DATANODE
setup config files (look the tutorial)
/usr/local/hadoop/bin/hadoop-daemon.sh start datanode
/usr/local/hadoop/bin/hadoop-daemon.sh start tasktracker
---
MASTER
$hdbin/hadoop dfsadmin -report
nano /usr/local/hadoop/conf/slaves (add a new node)
$hdbin/hadoop dfsadmin -refreshNodes
$hdbin/hadoop namenode restart
$hdbin/hadoop jobtracker restart
($hdbin/hadoop balancer  )
($hdbin/hadoop dfsadmin -report )

Hope it helps..

On 20 December 2011 18:38, Arpit Gupta ar...@hortonworks.com wrote:

 On the new nodes you are trying to add make sure the  dfs/data directories
 are empty. You probably have a VERSION file from an older deploy and thus
 causing the incompatible namespaceId error.


 --
 Arpit
 ar...@hortonworks.com


 On Dec 20, 2011, at 5:35 AM, Sloot, Hans-Peter wrote:

 
 
  But I ran into the : java.io.IOException: Incompatible namespaceIDs
 error every time.
  Should I config the files :  dfs/data/current/VERSION and
 dfs/name/current/VERSION  and  conf/*site.xml
  from other existing nodes?
 
 
 
 
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: dinsdag 20 december 2011 14:30
  To: common-user@hadoop.apache.org
  Cc: hdfs-...@hadoop.apache.org
  Subject: Re: Desperate Expanding,shrinking cluster or replacing
 failed nodes.
 
  Hans-Peter,
 
  Adding new nodes is simply (assuming network setup is sane and done):
 
  - Install/unpack services on new machine.
  - Deploy a config copy for the services.
  - Start the services.
 
  You should *not* format a NameNode *ever*, after the first time you
 start it up. Formatting loses all data of HDFS, so don't even think about
 that after the first time you use it :)
 
  On 20-Dec-2011, at 6:12 PM, Sloot, Hans-Peter wrote:
 
  Hello all,
 
  I have asked this question a couple of days ago but no one responded.
 
  I built a 6 node hadoop cluster, guided Michael Noll, starting with a
 single node and expanding it one by one.
  Every time I expanded the cluster I ran into error :
 java.io.IOException: Incompatible namespaceIDs
 
  So now my question is what is the correct procedure for expanding,
 shrinking a cluster?
  And how to replace a failed node?
 
  Can someone  point me to the correct manuals.
  I have already looked at the available documents on the wiki and
 hadoop.apache.org but could not find the answers.
 
  Regards Hans-Peter
 
 
 
 
 
  Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel
 bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd,
 verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te
 vernietigen. Aangezien de integriteit van het bericht niet veilig gesteld
 is middels verzending via internet, kan Atos Nederland B.V. niet
 aansprakelijk worden gehouden voor de inhoud daarvan. Hoewel wij ons
 inspannen een virusvrij netwerk te hanteren, geven wij geen enkele garantie
 dat dit bericht virusvrij is, noch aanvaarden wij enige aansprakelijkheid
 voor de mogelijke aanwezigheid van een virus in dit bericht. Op al onze
 rechtsverhoudingen, aanbiedingen en overeenkomsten waaronder Atos Nederland
 B.V. goederen en/of diensten levert zijn met uitsluiting van alle andere
 voorwaarden de Leveringsvoorwaarden van Atos Nederland B.V. van toepassing.
 Deze worden u op aanvraag direct kosteloos toegezonden.
 
  This e-mail and the documents attached are confidential and intended
 solely for the addressee; it may also be privileged. If you receive this
 e-mail in error, please notify the sender immediately and destroy it. As
 its integrity cannot be secured on the Internet, the Atos Nederland B.V.
 group liability cannot be triggered for the message content. Although the
 sender endeavours to maintain a computer virus-free network, the sender
 does not warrant that this transmission is virus-free and will not be
 liable for any damages resulting from any virus transmitted. On all offers
 and agreements under which Atos Nederland B.V. supplies goods and/or
 services of whatever nature, the Terms of Delivery from Atos Nederland B.V.
 exclusively apply. The Terms of Delivery shall be promptly submitted to you
 on your request.
 
  Atos Nederland B.V. / Utrecht
  KvK Utrecht 30132762
 
 
 
 
 
 
 
 
  Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel
 bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd,
 verzoeken wij u dit onmiddellijk aan ons te melden en het 

Re: TestFairScheduler failing - version 0.20. security 204

2011-12-18 Thread Merto Mertek
I figured out that if I run the test in console with ant
test-fairscheduler (my modification of target test in
src/contrib/build.xml) all tests runs ok. If I understand this right
probably testing is always done with ant and test files are never triggered
in eclipse ide.

Because I am rather new to all of this I would like to hear from you how do
you develop a new feature and how you test it. In my situation I would do
it as follows:
- develop a new feature ( make some code modification)
- build the scheduler with ant
- write unit tests
- run tests class from ant
- deploy a new scheduler build/jar to a cluster
- try it on a working cluster

Is there any other option how to try a new functionality locally or in any
other way? Any comments and suggestion are welcomed
Thank you..




On 17 December 2011 21:58, Merto Mertek masmer...@gmail.com wrote:

 Hi,

 I am having some problems with running the following test file

 org.apache.hadoop.mapred.TestFairScheduler

 Nearly all test fails, most of them with the error:
 javalang.runtimeexception: COULD NOT START JT. Here is a 
 tracehttp://pastebin.com/Jx90sYbw
 .
 Code was checkout from the svn branch, then I run ant build and ant
 eclipse. Test was run inside eclipse.

 I would like to solve those problems before modifying the scheduler. Any
 hints appreciated. Probably just some config issue?

 Thank you






TestFairScheduler failing - version 0.20. security 204

2011-12-17 Thread Merto Mertek
Hi,

I am having some problems with running the following test file

org.apache.hadoop.mapred.TestFairScheduler

Nearly all test fails, most of them with the error:
javalang.runtimeexception: COULD NOT START JT. Here is a
tracehttp://pastebin.com/Jx90sYbw
.
Code was checkout from the svn branch, then I run ant build and ant
eclipse. Test was run inside eclipse.

I would like to solve those problems before modifying the scheduler. Any
hints appreciated. Probably just some config issue?

Thank you


Re: Environment consideration for a research on scheduling

2011-09-27 Thread Merto Mertek
Desktop edition was chosen just to run the namemode and to monitor cluster
statistics. Workernodes were chosen to run on ubuntu server edition because
we find this configuration in several research papers. One of such
configuration can be found in the paper for LATE scheduler (is maybe some
source code of this available or is integrated in the new fair scheduler?)

thanks for the provided tools..

On 26 September 2011 11:41, Steve Loughran ste...@apache.org wrote:

 On 23/09/11 16:09, GOEKE, MATTHEW (AG/1000) wrote:

 If you are starting from scratch with no prior Hadoop install experience I
 would configure stand-alone, migrate to pseudo distributed and then to fully
 distributed verifying functionality at each step by doing a simple word
 count run. Also, if you don't mind using the CDH distribution then SCM /
 their rpms will greatly simplify both the bin installs as well as the user
 creation.

 Your VM route will most likely work but I can imagine the amount of
 hiccups during migration from that to the real cluster will not make it
 worth your time.

 Matt

 -Original Message-
 From: Merto Mertek [mailto:masmer...@gmail.com]
 Sent: Friday, September 23, 2011 10:00 AM
 To: common-user@hadoop.apache.org
 Subject: Environment consideration for a research on scheduling

 Hi,
 in the first phase we are planning to establish a small cluster with few
 commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
 10.10 and  a hadoop build from the branch 0.20.204 (i had some issues with
 version 0.20.203 with missing
 librarieshttp://hadoop-**common.472056.n3.nabble.com/**
 Development-enviroment-**problems-eclipse-hadoop-0-20-**
 203-td3186022.html#a3188567http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567
 ).
 Would you suggest any other version?


 I wouldn't run to put Ubuntu 10.x on; they make good desktops, but RHEL and
 CentOS are the platform of choice in the server side.




 In the second phase we are planning to analyse, test and modify some of
 hadoop schedulers.


 The main schedulers used by Y! and FB are fairly tuned for their workloads,
 and not apparently something you'd want to play with. There is at least one
 other scheduler in the contribs/ dir to play with.

 the other thing about scheduling is that you may have a faster development
 cycle if, instead of working on a real cluster, you simulate it and
 multiples of real time; using stats collected from your own workload by way
 of the gridmix2 tools. I've never done scheduling work, but think there's
 some stuff there to do that. if not, it's a possible contribution.

 Be aware that the changes in 0.23+ will change resource scheduling; this
 may be a better place to do development with a plan to deploy in 2012. Oh,
 and get on the mapreduce lists, esp, the -dev list, to discuss issues



  The information contained in this email may be subject to the export
 control laws and regulations of the United States, potentially
 including but not limited to the Export Administration Regulations (EAR)
 and sanctions regulations issued by the U.S. Department of
 Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
 information you are obligated to comply with all
 applicable U.S. export laws and regulations.


 I have no idea what that means but am not convinced that reading an email
 forces me to comply with a different country's rules



Re: Environment consideration for a research on scheduling

2011-09-24 Thread Merto Mertek
 I agree, we will go the standard route. Like you suggested we will go step
by step to the full cluster deployment. After the first node configuration
we will use clonezilla to replicate it and then setup them one by one..

On the workernodes I was thinking to run ubuntu server, namenode will run
ubuntu desktop. I am interested how should I configure the environment that
I will able to remotely monitor, analyse and configure the cluster. I will
run jobs outsite the local network via ssh to the namenode, however in this
situation I will not be abble to access the web interface of the job and
tasktracker. So I am wondering how to analyze them and how did you configure
your environment to be as practical as possible.

For monitoring the cluster I saw that ganglia is one of the option, but in
this stage of testing probably job-history files will be enough..

On 23 September 2011 17:09, GOEKE, MATTHEW (AG/1000) 
matthew.go...@monsanto.com wrote:

 If you are starting from scratch with no prior Hadoop install experience I
 would configure stand-alone, migrate to pseudo distributed and then to fully
 distributed verifying functionality at each step by doing a simple word
 count run. Also, if you don't mind using the CDH distribution then SCM /
 their rpms will greatly simplify both the bin installs as well as the user
 creation.

 Your VM route will most likely work but I can imagine the amount of hiccups
 during migration from that to the real cluster will not make it worth your
 time.

 Matt

 -Original Message-
 From: Merto Mertek [mailto:masmer...@gmail.com]
 Sent: Friday, September 23, 2011 10:00 AM
 To: common-user@hadoop.apache.org
 Subject: Environment consideration for a research on scheduling

 Hi,
 in the first phase we are planning to establish a small cluster with few
 commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
 10.10 and  a hadoop build from the branch 0.20.204 (i had some issues with
 version 0.20.203 with missing
 libraries
 http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567
 ).
 Would you suggest any other version?

 In the second phase we are planning to analyse, test and modify some of
 hadoop schedulers.

 Now I am interested what is the best way to deploy ubuntu and hadop to this
 few machine. I was thinking to configure the system in the local VM and
 then
 converting it to each physical machine but probably this is not the best
 option. If you know any other please share..

 Thanks you!
 This e-mail message may contain privileged and/or confidential information,
 and is intended to be received only by persons entitled
 to receive such information. If you have received this e-mail in error,
 please notify the sender immediately. Please delete it and
 all attachments from any servers, hard drives or any other media. Other use
 of this e-mail by you is strictly prohibited.

 All e-mails and attachments sent and received are subject to monitoring,
 reading and archival by Monsanto, including its
 subsidiaries. The recipient of this e-mail is solely responsible for
 checking for the presence of Viruses or other Malware.
 Monsanto, along with its subsidiaries, accepts no liability for any damage
 caused by any such code transmitted by or accompanying
 this e-mail or any attachment.


 The information contained in this email may be subject to the export
 control laws and regulations of the United States, potentially
 including but not limited to the Export Administration Regulations (EAR)
 and sanctions regulations issued by the U.S. Department of
 Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
 information you are obligated to comply with all
 applicable U.S. export laws and regulations.




Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek
Hi,
i am receiving messages from two mailing lists (common-dev,common-user)
and I would like to disable receiving msg from jira. I am not a member of
common-issues-unsubscribe list. Can I anyhow disable this? Thank you


Re: Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek
Probably there is not any option just to disable jira issues.. I will
probably need the common-dev list so I will stay subscribed..

Thank you...

On 23 September 2011 16:11, Harsh J ha...@cloudera.com wrote:

 Merto,

 You need common-dev-unsubscribe@

 The common-dev list receives just JIRA opened/resolved/reopened
 messages. The common-issues receives everything.

 On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek masmer...@gmail.com wrote:
  Hi,
  i am receiving messages from two mailing lists
 (common-dev,common-user)
  and I would like to disable receiving msg from jira. I am not a member of
  common-issues-unsubscribe list. Can I anyhow disable this? Thank you
 



 --
 Harsh J



Re: Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek
hehe :) you are right :)

On 23 September 2011 16:21, Harsh J ha...@cloudera.com wrote:

 Merto,

 Am sure your mail client has some form of filtering available in that case!
 :-)

 On Fri, Sep 23, 2011 at 7:49 PM, Merto Mertek masmer...@gmail.com wrote:
  Probably there is not any option just to disable jira issues.. I will
  probably need the common-dev list so I will stay subscribed..
 
  Thank you...
 
  On 23 September 2011 16:11, Harsh J ha...@cloudera.com wrote:
 
  Merto,
 
  You need common-dev-unsubscribe@
 
  The common-dev list receives just JIRA opened/resolved/reopened
  messages. The common-issues receives everything.
 
  On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek masmer...@gmail.com
 wrote:
   Hi,
   i am receiving messages from two mailing lists
  (common-dev,common-user)
   and I would like to disable receiving msg from jira. I am not a member
 of
   common-issues-unsubscribe list. Can I anyhow disable this? Thank you
  
 
 
 
  --
  Harsh J
 
 



 --
 Harsh J



Environment consideration for a research on scheduling

2011-09-23 Thread Merto Mertek
Hi,
in the first phase we are planning to establish a small cluster with few
commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
10.10 and  a hadoop build from the branch 0.20.204 (i had some issues with
version 0.20.203 with missing
librarieshttp://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567).
Would you suggest any other version?

In the second phase we are planning to analyse, test and modify some of
hadoop schedulers.

Now I am interested what is the best way to deploy ubuntu and hadop to this
few machine. I was thinking to configure the system in the local VM and then
converting it to each physical machine but probably this is not the best
option. If you know any other please share..

Thanks you!