from:"Merto Mertek"

Hadoop-on-demand and torque

2012-05-17 Thread Merto Mertek

If I understand it right HOD is mentioned mainly for merging existing HPC
clusters with hadoop and for testing purposes..

I cannot find what is the role of Torque here (just initial nodes
allocation?) and which is the default scheduler of HOD ?  Probably the
scheduler from the hadoop distribution?

In the doc is mentioned a MAUI scheduler, but probably if there would be an
integration with hadoop there will be any document on it..

thanks..

Re: Distributing MapReduce on a computer cluster

2012-04-25 Thread Merto Mertek

For distribution of load you can start reading some chapters from different
types of hadoop scheduler. I have not yet studied other implementation like
hadoop, however a very simplified version of distribution concept  is the
following:

a) Tasktracker ask for work (heartbeat consist of a status of the worker
node - # free slots)
b) Jobtracker pick a job from a list which is sorted based on the specified
policy (fairscheduling, fifo, lifo, other sla)
c) Tasktracker executes the map/reduce job

Like mentioned before there are a lot more details.. In b) there exists an
implementation of delay scheduling which is there to improve throughput by
taking account of input data location for a picked job. There you have a
preemption mechanism that regulate the fairness between pools,etc..

 A good start is book that Preshant mentioned...

On 23 April 2012 23:49, Prashant Kommireddi prash1...@gmail.com wrote:

 Shailesh, there's a lot that goes into distributing work across
 tasks/nodes. It's not just distributing work but also fault-tolerance,
 data locality etc that come into play. It might be good to refer
 Hadoop apache docs or Tom White's definitive guide.

 Sent from my iPhone

 On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala shailesh2...@gmail.com
 wrote:

  Hello,
 
  I am trying to design my own MapReduce Implementation and I want to know
  how hadoop is able to distribute its workload across multiple computers.
  Can anyone shed more light on this? thanks!

Re: Algorithms used in fairscheduler 0.20.205

2012-04-23 Thread Merto Mertek

Anyone?

On 19 April 2012 17:34, Merto Mertek masmer...@gmail.com wrote:

 I could find that the closest doc matching the current implementation of
 the fairscheduler could be find in this 
 documenthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlfrom
  Matei Zaharia et al.. Another documented from delay scheduling can be
 found from year 2010..

 a) I am interested if there maybe exist any newer documented version of
 the implementation?
 b) Are there any other algorithms in addition to delay scheduling,
 copy-compute splitting algorithm and  fairshare calculation algorithm
 that are important for the cluster performance and fairsharing?
 c) Is there maybe any connection between copy-compute splitting and
 mapreduce phases (copy-sort-reduce)?

 Thank you..

Algorithms used in fairscheduler 0.20.205

2012-04-19 Thread Merto Mertek

I could find that the closest doc matching the current implementation of
the fairscheduler could be find in this
documenthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlfrom
Matei Zaharia et al.. Another documented from delay scheduling can be
found from year 2010..

a) I am interested if there maybe exist any newer documented version of the
implementation?
b) Are there any other algorithms in addition to delay scheduling,
copy-compute splitting algorithm and  fairshare calculation algorithm
that are important for the cluster performance and fairsharing?
c) Is there maybe any connection between copy-compute splitting and
mapreduce phases (copy-sort-reduce)?

Thank you..

Fairscheduler - disable default pool

2012-03-13 Thread Merto Mertek

I know that by design all unmarked jobs goes to that pool, however I am
doing some testing and I am interested if is possible to disable it..

Thanks

Re: Fairscheduler - disable default pool

2012-03-13 Thread Merto Mertek

Thanks for your workaround, but I think that with this you just put a
constraint on the pool that it will not accept any job. I am doing some
calculation with weights and do not want the default pool weight to be
included in the computation..


On 13 March 2012 18:52, Jean-Daniel Cryans jdcry...@apache.org wrote:

 We do it here by setting this:

 poolMaxJobsDefault0/poolMaxJobsDefault

 So that you _must_ have a pool (that's configured with a different
 maxRunningJobs) in order to run jobs.

 Hope this helps,

 J-D

 On Tue, Mar 13, 2012 at 10:49 AM, Merto Mertek masmer...@gmail.com
 wrote:
  I know that by design all unmarked jobs goes to that pool, however I am
  doing some testing and I am interested if is possible to disable it..
 
  Thanks

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Merto Mertek

From the fairscheduler docs I assume the following should work:

property
 namemapred.fairscheduler.poolnameproperty/name
   valuepool.name/value
/property

property
  namepool.name/name
  value${mapreduce.job.group.name}/value
/property

which means that the default pool will be the group of the user that has
submitted the job. In your case I think that allocations.xml is correct. If
you want to explicitly define a job to specific pool from your
allocation.xml file you can define it as follows:

Configuration conf3 = conf;
conf3.set(pool.name, pool3); // conf.set(propriety.name, value)

Let me know if it works..


On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote:

 How can I set the fair scheduler such that all jobs submitted from a
 particular user group go to a pool with the group name?

 I have setup fair scheduler and I have two users: A and B (belonging to the
 user group hadoop)

 When these users submit hadoop jobs, the jobs from A got to a pool named A
 and the jobs from B go to a pool named B.
  I want them to go to a pool with their group name, So I tried adding the
 following to mapred-site.xml:

 property
  namemapred.fairscheduler.poolnameproperty/name
 valuegroup.name/value
 /property

 But instead the jobs now go to the default pool.
 I want the jobs submitted by A and B to go to the pool named hadoop. How
 do I do that?
 also how can I explicity set a job to any specified pool?

 I have set the allocation file (fair-scheduler.xml) like this:

 allocations
  pool name=hadoop
minMaps1/minMaps
minReduces1/minReduces
maxMaps3/maxMaps
maxReduces3/maxReduces
  /pool
  userMaxJobsDefault5/userMaxJobsDefault
 /allocations

 Any help is greatly appreciated.
 Thanks,
 Austin

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-29 Thread Merto Mertek

Varun sorry for my late response. Today I have deployed a new version and I
can confirm that patches you provided works well. I' ve been running some
jobs on a 5node cluster for an hour without a core on full load so now
thinks works as expected.

Thank you again!

I have used just your first option..

On 15 February 2012 19:53, mete efk...@gmail.com wrote:

 Well rebuilding ganglia seemed easier and Merto was testing the other so i
 though that i should give that one a chance :)
 anyway i will send you gdb details or patch hadoop and try it at my
 earliest convenience

 Cheers

 On Wed, Feb 15, 2012 at 6:59 PM, Varun Kapoor rez...@hortonworks.com
 wrote:

  The warnings about underflow are totally expected (they come from
 strtod(),
  and they will no longer occur with Hadoop-1.0.1, which applies my patch
  from HADOOP-8052), so that's not worrisome.
 
  As for the buffer overflow, do you think you could show me a backtrace of
  this core? If you can't find the core file on disk, just start gmetad
 under
  gdb, like so:
 
  $ sudo gdb path to gmetad
 
  (gdb) r --conf=path to your gmetad.conf
  ...
  ::Wait for crash::
  (gdb) bt
  (gdb) info locals
 
  If you're familiar with gdb, then I'd appreciate any additional diagnosis
  you could perform (for example, to figure out which metric's value caused
  this buffer overflow) - if you're not, I'll try and send you some gdb
  scripts to narrow things down once I see the output from this round of
  debugging.
 
  Also, out of curiosity, is patching Hadoop not an option for you? Or is
 it
  just that rebuilding (and redeploying) ganglia is the lesser of the 2
  evils? :)
 
  Varun
 
  On Tue, Feb 14, 2012 at 11:43 PM, mete efk...@gmail.com wrote:
 
   Hello Varun,
   i have patched and recompiled ganglia from source bit it still cores
  after
   the patch.
  
   Here are some logs:
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: RRD_update
  
  
 
 (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd):
  
  
 
 /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd:
   converting '4.9E-324' to float: Numerical result out of range
   Feb 15 09:39:14 master gmetad[16487]: *** buffer overflow detected ***:
   gmetad terminated
  
   i am using hadoop.1.0.0 and ganglia 3.20 tarball.
  
   Cheers
   Mete
  
   On Sat, Feb 11, 2012 at 2:19 AM, Merto Mertek masmer...@gmail.com
  wrote:
  
Varun unfortunately I have had some problems with deploying a new
  version
on the cluster.. Hadoop is not picking the new build in lib folder
   despite
a classpath is set to it. The new build is picked just if I put it in
  the
$HD_HOME/share/hadoop/, which is very strange.. I've done this on all
   nodes
and can access the web, but all tasktracker are being stopped because
  of
   an
error:
   
INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
   Cleanup...
 java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at

   
  
 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)

   
   
Probably the error is the consequence of an inadequate deploy of a
  jar..
   I
will ask to the dev list how they do it or are you maybe having any
  other
idea?
   
   
   
On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com
  wrote:
   
 Hey Merto,

 Any luck getting the patch running on your cluster?

 In case you're interested, there's now a JIRA for this:
 https://issues.apache.org/jira/browse/HADOOP-8052.

 Varun

 On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor 
 rez

Re: Tasktracker fails

2012-02-22 Thread Merto Mertek

Hm.. I would try first to stop all the deamons wtih
$haddop_home/bin/stop-all.sh. Afterwards check that on the master and one
of the slaves no deamons are running (jps). Maybe you could try to check if
your conf on tasktrackers for the jobtracker is pointing to the right place
(mapred-site.xml). Do you see any error in the jobtracker log too?


On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote:

 Any update on the below issue.

 Thanks


 Adarsh Sharma wrote:

 Dear all,

 Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
 When I start my cluster , all daemons got started except tasktracker,
 don't know why task tracker fails due to following error logs.

 Cluster is in private network.My /etc/hosts file contains all IP hostname
 resolution commands in all  nodes.

 2012-02-21 17:48:33,056 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
 MBean for source TaskTrackerMetrics registered.
 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker:
 Can not start task tracker because java.net.SocketException: Invalid
 argument
   at sun.nio.ch.Net.bind(Native Method)
   at sun.nio.ch.**ServerSocketChannelImpl.bind(**
 ServerSocketChannelImpl.java:**119)
   at sun.nio.ch.**ServerSocketAdaptor.bind(**
 ServerSocketAdaptor.java:59)
   at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**
 301)
   at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
   at org.apache.hadoop.mapred.**TaskTracker.initialize(**
 TaskTracker.java:772)
   at org.apache.hadoop.mapred.**TaskTracker.init(**
 TaskTracker.java:1428)
   at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
 java:3673)

 Any comments on the issue.


 Thanks

Re: Tasktracker fails

2012-02-22 Thread Merto Mertek

I do not know how the distribution and splitting of deflate files exactly
works if that is your question but probably you will find something useful
in *Codec classes, where are located implementations of few compression
formats. Deflate files are just a type of compression files that you can
use for storing files in your system. There are several others types,
depending on your needs and tradeofs you are dealing (space or time for
compressing).

 Globs I think are just a matching strategy to match files/folders together
with regular expressions..


On 22 February 2012 19:29, Jay Vyas jayunit...@gmail.com wrote:

 Hi guys !

 Im trying to understand the way globstatus / deflate files work in hdfs.  I
 cant read them using the globStatus API in the hadoop FileSystem , from
 java.  the specifics are here if anyone wants some easy stackoverflow
 points :)


 http://stackoverflow.com/questions/9400739/hadoop-globstatus-and-deflate-files

 On Wed, Feb 22, 2012 at 7:39 AM, Merto Mertek masmer...@gmail.com wrote:

  Hm.. I would try first to stop all the deamons wtih
  $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one
  of the slaves no deamons are running (jps). Maybe you could try to check
 if
  your conf on tasktrackers for the jobtracker is pointing to the right
 place
  (mapred-site.xml). Do you see any error in the jobtracker log too?
 
 
  On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com
 wrote:
 
   Any update on the below issue.
  
   Thanks
  
  
   Adarsh Sharma wrote:
  
   Dear all,
  
   Today I am trying  to configure hadoop-0.20.205.0 on a 4  node
 Cluster.
   When I start my cluster , all daemons got started except tasktracker,
   don't know why task tracker fails due to following error logs.
  
   Cluster is in private network.My /etc/hosts file contains all IP
  hostname
   resolution commands in all  nodes.
  
   2012-02-21 17:48:33,056 INFO
  org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
   MBean for source TaskTrackerMetrics registered.
   2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker:
   Can not start task tracker because java.net.SocketException: Invalid
   argument
 at sun.nio.ch.Net.bind(Native Method)
 at sun.nio.ch.**ServerSocketChannelImpl.bind(**
   ServerSocketChannelImpl.java:**119)
 at sun.nio.ch.**ServerSocketAdaptor.bind(**
   ServerSocketAdaptor.java:59)
 at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
 at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**
   301)
 at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
 at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
 at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
 at org.apache.hadoop.mapred.**TaskTracker.initialize(**
   TaskTracker.java:772)
 at org.apache.hadoop.mapred.**TaskTracker.init(**
   TaskTracker.java:1428)
 at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
   java:3673)
  
   Any comments on the issue.
  
  
   Thanks
  
  
  
 



 --
 Jay Vyas
 MMSB/UCHC

Re: Dynamic changing of slaves

2012-02-21 Thread Merto Mertek

I think that job configuration does not allow you such setup, however maybe
I missed something..

 Probably I would tackle this problem from the scheduler source. The
default one is JobQueueTaskScheduler which preserves a fifo based queue.
When a tasktracker (your slave) tells the jobtracker that it has some free
slots to run, JT in the heartbeat method calls the scheduler assignTasks
method where tasks are assigned on local basis. In other words, scheduler
tries to find tasks on the tasktracker which data resides on it. If the
scheduler will not find a local map/reduce task to run it will try to find
a non local one. Probably here is the point where you should do something
with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting
for the TT heartbeat, maybe there is another option to force an
heartbeatResponse, despite the TT has not send a heartbeat but I am not
aware of it..


On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote:


 Hi,

 I am working on a project which requires a setup as follows:

 One master with four slaves.However, when a map only program is run, the
 master dynamically selects the slave to run the map. For example, when the
 program is run for the first time, slave 2 is selected to run the map and
 reduce programs, and the output is stored on dfs. When the program is run
 the second time, slave 3 is selected and son on.

 I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

 Any ideas on creating the setup as described above?

 Regards

 --
 View this message in context:
 http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-10 Thread Merto Mertek

Varun unfortunately I have had some problems with deploying a new version
on the cluster.. Hadoop is not picking the new build in lib folder despite
a classpath is set to it. The new build is picked just if I put it in the
$HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes
and can access the web, but all tasktracker are being stopped because of an
error:

INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)

Probably the error is the consequence of an inadequate deploy of a jar.. I
will ask to the dev list how they do it or are you maybe having any other
idea?

On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com wrote:

Hey Merto,

Any luck getting the patch running on your cluster?

In case you're interested, there's now a JIRA for this:
https://issues.apache.org/jira/browse/HADOOP-8052.

Varun

On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor rez...@hortonworks.com
wrote:

Your general procedure sounds correct (i.e. dropping your newly built
.jar
into $HD_HOME/lib/), but to make sure it's getting picked up, you should
explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH
environment
variable; here's mine, as an example:

export HADOOP_CLASSPATH=.:./build/*.jar

About your second point, you certainly need to copy this newly patched
.jar to every node in your cluster, because my patch changes the value
of a
couple metrics emitted TO gmetad (FROM all the nodes in the cluster), so
without copying it over to every node in the cluster, gmetad will still
likely receive some bad metrics.

Varun

On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek masmer...@gmail.com
wrote:

I will need your help. Please confirm if the following procedure is
right.
I have a dev environment where I pimp my scheduler (no hadoop running)
and
a small cluster environment where the changes(jars) are deployed with
some
scripts, however I have never compiled the whole hadoop from source so
I
do not know if I am doing it right. I' ve done it as follow:

a) apply a patch
b) cd $HD_HOME; ant
c) copy $HD_HOME/*build*/patched-core-hadoop.jar -
cluster:/$HD_HOME/*lib*
d) run $HD_HOME/bin/start-all.sh

Is this enough? When I tried to test hadoop dfs -ls / I could see
that a
new jar was not loaded and instead a jar from
$HD_HOME/*share*/hadoop-20.205.0.jar
was taken..
Should I copy the entire hadoop folder to all nodes and reconfigure the
entire cluster for the new build, or is enough if I configure it just on
the node where gmetad will run?

On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com wrote:

I'm so sorry, Merto - like a silly goose, I attached the 2 patches to
my
reply, and of course the mailing list did not accept the attachment.

I plan on opening JIRAs for this tomorrow, but till then, here are
links to
the 2 patches (from my Dropbox account):

- http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch
- http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch

Here's hoping this works for you,

Varun
On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek masmer...@gmail.com
wrote:

Varun, have I missed your link to the patches? I have tried to
search
them
on jira but I did not find them.. Can you repost the link for these
two
patches?

Thank you..

On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com
wrote:

I'm sorry to hear that gmetad cores continuously for you guys.
Since
I'm
not seeing that behavior, I'm going to just put out the 2 possible
patches
you could apply and wait to hear back from you. :)

Option 1

* Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (

http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup
)
in your Hadoop sources and rebuild Hadoop.

Option 2

* Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c
and
rebuild gmetad.

Only 1 of these 2 fixes is required, and it would help me if you
could
first try Option 1 and let me know if that fixes things for you.

Varun

On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote:

Same with Merto's situation here, it always overflows short time
after
the
restart. Without the hadoop metrics enabled everything is smooth.
Regards

Mete

On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek
masmer...@gmail.com
wrote:

I have tried to run it but it repeats crashing..

- When you start gmetad

Compile and deploy a new version of hadoop

2012-02-09 Thread Merto Mertek

I am having some troubles in understanding how the whole stuff works..

Compiling with ant works ok and I am able to compile a jar which is
afterwards deployed to the cluster. On the cluster I've set the
HADOOP_CLASSPATH variable to point just to jar files in the lib folder
($HD_HOME/lib/*.jar), where I put the new compiled
hadoop-core-myversion.jar.

Before deploying I guarantee that in the $HD_HOME folder and $HD_HOME/lib
there are no previous version of hadoop-core-xxx.jar or core-3.3.1.jar .
The problem is that I suspect that hadoop is picking the wrong hadoop-core
jars so I am interested how the whole mechanism works and what is the
purpose of the $HD_HOME/share/hadoop folder where I can locate other
hadoop-core jars and which is included in the classpath in hadoop-env.sh?


My last question is what is the easiest way to see that your build is up
and running?  Maybe from the release tag in JT?

Thanks you..

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-08 Thread Merto Mertek

I will need your help. Please confirm if the following procedure is right.
I have a dev environment where I pimp my scheduler (no hadoop running) and
a small cluster environment where the changes(jars) are deployed with some
scripts, however I have never compiled the whole hadoop from source so I
do not know if I am doing it right. I' ve done it as follow:

a) apply a patch
b) cd $HD_HOME; ant
c) copy $HD_HOME/*build*/patched-core-hadoop.jar - cluster:/$HD_HOME/*lib*
d) run $HD_HOME/bin/start-all.sh

Is this enough? When I tried to test hadoop dfs -ls / I could see that a
new jar was not loaded and instead a jar from
$HD_HOME/*share*/hadoop-20.205.0.jar
was taken..
Should I copy the entire hadoop folder to all nodes and reconfigure the
entire cluster for the new build, or is enough if I configure it just on
the node where gmetad will run?

On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com wrote:

I'm so sorry, Merto - like a silly goose, I attached the 2 patches to my
reply, and of course the mailing list did not accept the attachment.

I plan on opening JIRAs for this tomorrow, but till then, here are links to
the 2 patches (from my Dropbox account):

- http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch
- http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch

Here's hoping this works for you,

Varun
On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek masmer...@gmail.com wrote:

Varun, have I missed your link to the patches? I have tried to search
them
on jira but I did not find them.. Can you repost the link for these two
patches?

Thank you..

On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote:

I'm sorry to hear that gmetad cores continuously for you guys. Since
I'm
not seeing that behavior, I'm going to just put out the 2 possible
patches
you could apply and wait to hear back from you. :)

Option 1

* Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (

http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup)
in your Hadoop sources and rebuild Hadoop.

Option 2

* Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and
rebuild gmetad.

Only 1 of these 2 fixes is required, and it would help me if you could
first try Option 1 and let me know if that fixes things for you.

Varun

On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote:

Same with Merto's situation here, it always overflows short time after
the
restart. Without the hadoop metrics enabled everything is smooth.
Regards

Mete

On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com
wrote:

I have tried to run it but it repeats crashing..

- When you start gmetad and Hadoop is not emitting metrics,
everything
is peachy.

Right, running just ganglia without running hadoop jobs seems stable
for at
least a day..

- When you start Hadoop (and it thus starts emitting metrics),
gmetad
cores.

True, with a following error : *** stack smashing detected ***:
gmetad
terminated \n Segmentation fault

- On my MacBookPro, it's a SIGABRT due to a buffer overflow.

I believe this is happening for everyone. What I would like for
you
to
try
out are the following 2 scenarios:

- Once gmetad cores, if you start it up again, does it core
again?
Does
this process repeat ad infinitum?

- On my MBP, the core is a one-time thing, and restarting gmetad
after the first core makes things run perfectly smoothly.
- I know others are saying this core occurs continuously,
but
they
were all using ganglia-3.1.x, and I'm interested in how
ganglia-3.2.0
behaves for you.

It cores everytime I run it. The difference is just that sometimes a
segmentation faults appears instantly, and sometimes it appears
after
a
random time...lets say after a minute of running gmetad and
collecting
data.

- If you start Hadoop first (so gmetad is not running when
the
first batch of Hadoop metrics are emitted) and THEN start gmetad
after
a
few seconds, do you still see gmetad coring?

Yes

- On my MBP, this sequence works perfectly fine, and there
are
no
gmetad cores whatsoever.

I have tested this scenario with 2 working nodes so two gmond plus
the
head
gmond on the server where gmetad is located. I have checked and all
of
them
are versioned 3.2.0.

Hope it helps..

Bear in mind that this only addresses the gmetad coring issue -
the
warnings emitted about '4.9E-324' being out of range will
continue,
but I
know what's causing

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-07 Thread Merto Mertek

Varun, have I missed your link to the patches? I have tried to search them
on jira but I did not find them.. Can you repost the link for these two
patches?

Thank you..

On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote:

I'm sorry to hear that gmetad cores continuously for you guys. Since I'm
not seeing that behavior, I'm going to just put out the 2 possible patches
you could apply and wait to hear back from you. :)

Option 1

* Apply gmetadBufferOverflow.Hadoop.patch to the relevant file (
http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupin
my setup) in your Hadoop sources and rebuild Hadoop.

Option 2

* Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and
rebuild gmetad.

Only 1 of these 2 fixes is required, and it would help me if you could
first try Option 1 and let me know if that fixes things for you.

Varun

On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote:

Same with Merto's situation here, it always overflows short time after the
restart. Without the hadoop metrics enabled everything is smooth.
Regards

Mete

On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com wrote:

I have tried to run it but it repeats crashing..

- When you start gmetad and Hadoop is not emitting metrics, everything
is peachy.

Right, running just ganglia without running hadoop jobs seems stable
for at
least a day..

- When you start Hadoop (and it thus starts emitting metrics),
gmetad
cores.

True, with a following error : *** stack smashing detected ***: gmetad
terminated \n Segmentation fault

- On my MacBookPro, it's a SIGABRT due to a buffer overflow.

I believe this is happening for everyone. What I would like for you to
try
out are the following 2 scenarios:

- Once gmetad cores, if you start it up again, does it core again?
Does
this process repeat ad infinitum?

- On my MBP, the core is a one-time thing, and restarting gmetad
after the first core makes things run perfectly smoothly.
- I know others are saying this core occurs continuously, but
they
were all using ganglia-3.1.x, and I'm interested in how
ganglia-3.2.0
behaves for you.

It cores everytime I run it. The difference is just that sometimes a
segmentation faults appears instantly, and sometimes it appears after a
random time...lets say after a minute of running gmetad and collecting
data.

- If you start Hadoop first (so gmetad is not running when the
first batch of Hadoop metrics are emitted) and THEN start gmetad
after
a
few seconds, do you still see gmetad coring?

Yes

- On my MBP, this sequence works perfectly fine, and there are no
gmetad cores whatsoever.

I have tested this scenario with 2 working nodes so two gmond plus the
head
gmond on the server where gmetad is located. I have checked and all of
them
are versioned 3.2.0.

Hope it helps..

Bear in mind that this only addresses the gmetad coring issue - the
warnings emitted about '4.9E-324' being out of range will continue,
but I
know what's causing that as well (and hope that my patch fixes it for
free).

Varun
On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek masmer...@gmail.com
wrote:

Yes I am encoutering the same problems and like Mete said few
seconds
after restarting a segmentation fault appears.. here is my conf..
http://pastebin.com/VgBjp08d

And here are some info from /var/log/messages (ubuntu server 10.10):

kernel: [424447.140641] gmetad[26115] general protection
ip:7f7762428fdb
sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]

When I compiled gmetad I used the following command:

./configure --with-gmetad --sysconfdir=/etc/ganglia
CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include
CFLAGS=-I/usr/local/rrdtool-1.4.7/include
LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib

The same was tried with rrdtool 1.4.5. My current ganglia version is
3.2.0
and like Mete I tried it with version 3.1.7 but without success..

Hope we will sort it out soon any solution..
thank you

On 6 February 2012 20:09, mete efk...@gmail.com wrote:

Hello,
i also face this issue when using GangliaContext31 and
hadoop-1.0.0,
and
ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer
overflows
as
soon as i restart the gmetad.
Regards
Mete

On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate
gog...@hortonworks.com wrote:

I assume you have seen the following information on Hadoop
twiki,
http://wiki.apache.org/hadoop/GangliaMetrics

So do you use GangliaContext31 in hadoop-metrics2.properties?

We use Ganglia 3.2 with Hadoop

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Merto Mertek

Yes I am encoutering the same problems and like Mete said few seconds
after restarting a segmentation fault appears.. here is my conf..
http://pastebin.com/VgBjp08d

And here are some info from /var/log/messages (ubuntu server 10.10):

kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb
sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]

When I compiled gmetad I used the following command:

./configure --with-gmetad --sysconfdir=/etc/ganglia
CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include
CFLAGS=-I/usr/local/rrdtool-1.4.7/include
LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib

The same was tried with rrdtool 1.4.5. My current ganglia version is 3.2.0
and like Mete I tried it with version 3.1.7 but without success..

Hope we will sort it out soon any solution..
thank you

On 6 February 2012 20:09, mete efk...@gmail.com wrote:

Hello,
i also face this issue when using GangliaContext31 and hadoop-1.0.0, and
ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as
soon as i restart the gmetad.
Regards
Mete

On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate
gog...@hortonworks.com wrote:

I assume you have seen the following information on Hadoop twiki,
http://wiki.apache.org/hadoop/GangliaMetrics

So do you use GangliaContext31 in hadoop-metrics2.properties?

We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember seeing
gmetad sometime goes down due to buffer overflow problem when hadoop
starts
pumping in the metrics.. but restarting works.. let me know if you face
same problem?

--Suhas

Additionally, the Ganglia protocol change significantly between Ganglia
3.0
and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0
clients). This caused Hadoop to not work with Ganglia 3.1; there is a
patch
available for this, HADOOP-4675. As of November 2010, this patch has been
rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1
protocol in place of the 3.0, substitute
org.apache.hadoop.metrics.ganglia.GangliaContext31 for
org.apache.hadoop.metrics.ganglia.GangliaContext in the
hadoop-metrics.properties lines above.

On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek masmer...@gmail.com
wrote:

I spent a lot of time to figure it out however i did not find a
solution.
Problems from the logs pointed me for some bugs in rrdupdate tool,
however
i tried to solve it with different versions of ganglia and rrdtool but
the
error is the same. Segmentation fault appears after the following
lines,
if
I run gmetad in debug mode...

Created rrd

/var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd
Created rrd

/var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd

which I suppose are generated from MetricsSystemImpl.java (Is there any
way
just to disable this two metrics?)

From the /var/log/messages there are a lot of errors:

xxx gmetad[15217]: RRD_update

(/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
converting '4.9E-324' to float: Numerical result out of range
xxx gmetad[15217]: RRD_update

(/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
converting '4.9E-324' to float: Numerical result out of range

so probably there are some converting issues ? Where should I look for
the
solution? Would you rather suggest to use ganglia 3.0.x with the old
protocol and leave the version 3.1 for further releases?

any help is realy appreciated...

On 1 February 2012 04:04, Merto Mertek masmer...@gmail.com wrote:

I would be glad to hear that too.. I've setup the following:

Hadoop 0.20.205
Ganglia Front 3.1.7
Ganglia Back *(gmetad)* 3.1.7
RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles
installing 1.4.4

Ganglia works just in case hadoop is not running, so metrics are not
publshed to gmetad node (conf with new hadoop-metrics2.proprieties).
When
hadoop is started, a segmentation fault appears in gmetad deamon:

sudo gmetad -d 2
...
Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
Updating host xxx, metric bytes_in
Updating host xxx, metric bytes_out
Updating host xxx, metric
metricssystem.MetricsSystem.publish_max_time
Created rrd

/var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
Segmentation fault

And some info from the apache log http://pastebin.com/nrqKRtKJ..

Can someone suggest a ganglia version that is tested with hadoop
0.20.205?
I will try to sort it out however it seems a not so tribial problem..

Thank you

On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com
wrote:

or Do I have to apply some hadoop patch

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Merto Mertek

I have tried to run it but it repeats crashing..

  - When you start gmetad and Hadoop is not emitting metrics, everything
   is peachy.


Right, running just ganglia without running hadoop jobs seems stable for at
least a day..


   - When you start Hadoop (and it thus starts emitting metrics), gmetad
   cores.


True, with a  following error : *** stack smashing detected ***: gmetad
terminated \n Segmentation fault

 - On my MacBookPro, it's a SIGABRT due to a buffer overflow.

 I believe this is happening for everyone. What I would like for you to try
 out are the following 2 scenarios:

   - Once gmetad cores, if you start it up again, does it core again? Does
   this process repeat ad infinitum?

 - On my MBP, the core is a one-time thing, and restarting gmetad
  after the first core makes things run perfectly smoothly.
 - I know others are saying this core occurs continuously, but they
 were all using ganglia-3.1.x, and I'm interested in how
 ganglia-3.2.0
 behaves for you.


It cores everytime I run it. The difference is just that sometimes a
segmentation faults appears instantly, and sometimes it appears after a
random time...lets say after a minute of running gmetad and collecting data.


 - If you start Hadoop first (so gmetad is not running when the
   first batch of Hadoop metrics are emitted) and THEN start gmetad after a
   few seconds, do you still see gmetad coring?


Yes


  - On my MBP, this sequence works perfectly fine, and there are no
  gmetad cores whatsoever.


I have tested this scenario with 2 working nodes so two gmond plus the head
gmond on the server where gmetad is located. I have checked and all of them
are versioned 3.2.0.

Hope it helps..




 Bear in mind that this only addresses the gmetad coring issue - the
 warnings emitted about '4.9E-324' being out of range will continue, but I
 know what's causing that as well (and hope that my patch fixes it for
 free).

 Varun
 On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek masmer...@gmail.com wrote:

  Yes I am encoutering the same problems and like Mete said  few seconds
  after restarting a segmentation fault appears.. here is my conf..
  http://pastebin.com/VgBjp08d
 
  And here are some info from /var/log/messages (ubuntu server 10.10):
 
  kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb
   sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]
  
 
  When I compiled gmetad I used the following command:
 
  ./configure --with-gmetad --sysconfdir=/etc/ganglia
   CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include
   CFLAGS=-I/usr/local/rrdtool-1.4.7/include
   LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib
  
 
  The same was tried with rrdtool 1.4.5. My current ganglia version is
 3.2.0
  and like Mete I tried it with version 3.1.7 but without success..
 
  Hope we will sort it out soon any solution..
  thank you
 
 
  On 6 February 2012 20:09, mete efk...@gmail.com wrote:
 
   Hello,
   i also face this issue when using GangliaContext31 and hadoop-1.0.0,
 and
   ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows
 as
   soon as i restart the gmetad.
   Regards
   Mete
  
   On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate 
   gog...@hortonworks.com wrote:
  
I assume you have seen the following information on Hadoop twiki,
http://wiki.apache.org/hadoop/GangliaMetrics
   
So do you use GangliaContext31 in hadoop-metrics2.properties?
   
We use Ganglia 3.2 with Hadoop 20.205  and works fine (I remember
  seeing
gmetad sometime goes down due to buffer overflow problem when hadoop
   starts
pumping in the metrics.. but restarting works.. let me know if you
 face
same problem?
   
--Suhas
   
Additionally, the Ganglia protocol change significantly between
 Ganglia
   3.0
and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0
clients). This caused Hadoop to not work with Ganglia 3.1; there is a
   patch
available for this, HADOOP-4675. As of November 2010, this patch has
  been
rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1
protocol in place of the 3.0, substitute
org.apache.hadoop.metrics.ganglia.GangliaContext31 for
org.apache.hadoop.metrics.ganglia.GangliaContext in the
hadoop-metrics.properties lines above.
   
On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek masmer...@gmail.com
   wrote:
   
 I spent a lot of time to figure it out however i did not find a
   solution.
 Problems from the logs pointed me for some bugs in rrdupdate tool,
however
 i tried to solve it with different versions of ganglia and rrdtool
  but
the
 error is the same. Segmentation fault appears after the following
   lines,
if
 I run gmetad in debug mode...

 Created rrd


   
  
 
 /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd
 Created rrd


   
  
 
 /var/lib/ganglia

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-03 Thread Merto Mertek

I spent a lot of time to figure it out however i did not find a solution.
Problems from the logs pointed me for some bugs in rrdupdate tool, however
i tried to solve it with different versions of ganglia and rrdtool but the
error is the same. Segmentation fault appears after the following lines, if
I run gmetad in debug mode...

Created rrd
/var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd
Created rrd
/var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd


which I suppose are generated from MetricsSystemImpl.java (Is there any way
just to disable this two metrics?)

From the /var/log/messages there are a lot of errors:

xxx gmetad[15217]: RRD_update
(/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
converting  '4.9E-324' to float: Numerical result out of range
xxx gmetad[15217]: RRD_update
(/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
converting  '4.9E-324' to float: Numerical result out of range

so probably there are some converting issues ? Where should I look for the
solution? Would you rather suggest to use ganglia 3.0.x with the old
protocol and leave the version 3.1 for further releases?

any help is realy appreciated...

On 1 February 2012 04:04, Merto Mertek masmer...@gmail.com wrote:

 I would be glad to hear that too.. I've setup the following:

 Hadoop 0.20.205
 Ganglia Front  3.1.7
 Ganglia Back *(gmetad)* 3.1.7
 RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles
 installing 1.4.4

 Ganglia works just in case hadoop is not running, so metrics are not
 publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When
 hadoop is started, a segmentation fault appears in gmetad deamon:

 sudo gmetad -d 2
 ...
 Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
 Updating host xxx, metric bytes_in
 Updating host xxx, metric bytes_out
 Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time
 Created rrd
 /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
 Segmentation fault

 And some info from the apache log http://pastebin.com/nrqKRtKJ..

 Can someone suggest a ganglia version that is tested with hadoop 0.20.205?
 I will try to sort it out however it seems a not so tribial problem..

 Thank you





 On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote:

 or Do I have to apply some hadoop patch for this ?

 Thanks,
 Praveenesh

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-01-31 Thread Merto Mertek

I would be glad to hear that too.. I've setup the following:

Hadoop 0.20.205
Ganglia Front  3.1.7
Ganglia Back *(gmetad)* 3.1.7
RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles installing
1.4.4

Ganglia works just in case hadoop is not running, so metrics are not
publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When
hadoop is started, a segmentation fault appears in gmetad deamon:

sudo gmetad -d 2
...
Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
Updating host xxx, metric bytes_in
Updating host xxx, metric bytes_out
Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time
Created rrd
/var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
Segmentation fault

And some info from the apache log http://pastebin.com/nrqKRtKJ..

Can someone suggest a ganglia version that is tested with hadoop 0.20.205?
I will try to sort it out however it seems a not so tribial problem..

Thank you





On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote:

 or Do I have to apply some hadoop patch for this ?

 Thanks,
 Praveenesh

Configure hadoop scheduler

2011-12-20 Thread Merto Mertek

Hi,

I am having problems with changing the default hadoop scheduler (i assume
that the default scheduler is a FIFO scheduler).

I am following the guide located in hadoop/docs directory however I am not
able to run it.  Link for scheduling administration returns an http error
404 ( http://localhost:50030/scheduler ). In the UI under scheduling
information I can see only one queue named default. mapred-site.xml file
is accessible because when changing a port for a jobtracker I can see a
daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added
to .bashrc, however that did not solve the problem. I tried to rebuild
hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the
hadoop classpath in hadoop-env.sh to point to the lib folder, but without
success. The only info of the scheduler that is seen in the jobtracker log
is the folowing info:

Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)


I am working on this several days and running out of ideas... I am
wondering how to fix it and where to check currently active scheduler
parameters?

Config files:
mapred-site.xml http://pastebin.com/HmDfWqE1
allocation.xml http://pastebin.com/Uexq7uHV
Tried versions: 0.20.203 and 204

Thank you

Re: Desperate!!!! Expanding,shrinking cluster or replacing failed nodes.

2011-12-20 Thread Merto Mertek

I followed the same tutorial as you. If I am not wrong the problem arise
because you first tried to run a node as single node and then joining it to
the cluster (like Arpit mentioned). After testing that the new node works
ok try to delete content in directory /app/hadoop/tmp/ and insert a new
node to the cluster.When I setup config files on the new node I followed
the following procedure:

DATANODE
setup config files (look the tutorial)
/usr/local/hadoop/bin/hadoop-daemon.sh start datanode
/usr/local/hadoop/bin/hadoop-daemon.sh start tasktracker
---
MASTER
$hdbin/hadoop dfsadmin -report
nano /usr/local/hadoop/conf/slaves (add a new node)
$hdbin/hadoop dfsadmin -refreshNodes
$hdbin/hadoop namenode restart
$hdbin/hadoop jobtracker restart
($hdbin/hadoop balancer  )
($hdbin/hadoop dfsadmin -report )

Hope it helps..

On 20 December 2011 18:38, Arpit Gupta ar...@hortonworks.com wrote:

 On the new nodes you are trying to add make sure the  dfs/data directories
 are empty. You probably have a VERSION file from an older deploy and thus
 causing the incompatible namespaceId error.


 --
 Arpit
 ar...@hortonworks.com


 On Dec 20, 2011, at 5:35 AM, Sloot, Hans-Peter wrote:

 
 
  But I ran into the : java.io.IOException: Incompatible namespaceIDs
 error every time.
  Should I config the files :  dfs/data/current/VERSION and
 dfs/name/current/VERSION  and  conf/*site.xml
  from other existing nodes?
 
 
 
 
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: dinsdag 20 december 2011 14:30
  To: common-user@hadoop.apache.org
  Cc: hdfs-...@hadoop.apache.org
  Subject: Re: Desperate Expanding,shrinking cluster or replacing
 failed nodes.
 
  Hans-Peter,
 
  Adding new nodes is simply (assuming network setup is sane and done):
 
  - Install/unpack services on new machine.
  - Deploy a config copy for the services.
  - Start the services.
 
  You should *not* format a NameNode *ever*, after the first time you
 start it up. Formatting loses all data of HDFS, so don't even think about
 that after the first time you use it :)
 
  On 20-Dec-2011, at 6:12 PM, Sloot, Hans-Peter wrote:
 
  Hello all,
 
  I have asked this question a couple of days ago but no one responded.
 
  I built a 6 node hadoop cluster, guided Michael Noll, starting with a
 single node and expanding it one by one.
  Every time I expanded the cluster I ran into error :
 java.io.IOException: Incompatible namespaceIDs
 
  So now my question is what is the correct procedure for expanding,
 shrinking a cluster?
  And how to replace a failed node?
 
  Can someone  point me to the correct manuals.
  I have already looked at the available documents on the wiki and
 hadoop.apache.org but could not find the answers.
 
  Regards Hans-Peter
 
 
 
 
 
  Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel
 bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd,
 verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te
 vernietigen. Aangezien de integriteit van het bericht niet veilig gesteld
 is middels verzending via internet, kan Atos Nederland B.V. niet
 aansprakelijk worden gehouden voor de inhoud daarvan. Hoewel wij ons
 inspannen een virusvrij netwerk te hanteren, geven wij geen enkele garantie
 dat dit bericht virusvrij is, noch aanvaarden wij enige aansprakelijkheid
 voor de mogelijke aanwezigheid van een virus in dit bericht. Op al onze
 rechtsverhoudingen, aanbiedingen en overeenkomsten waaronder Atos Nederland
 B.V. goederen en/of diensten levert zijn met uitsluiting van alle andere
 voorwaarden de Leveringsvoorwaarden van Atos Nederland B.V. van toepassing.
 Deze worden u op aanvraag direct kosteloos toegezonden.
 
  This e-mail and the documents attached are confidential and intended
 solely for the addressee; it may also be privileged. If you receive this
 e-mail in error, please notify the sender immediately and destroy it. As
 its integrity cannot be secured on the Internet, the Atos Nederland B.V.
 group liability cannot be triggered for the message content. Although the
 sender endeavours to maintain a computer virus-free network, the sender
 does not warrant that this transmission is virus-free and will not be
 liable for any damages resulting from any virus transmitted. On all offers
 and agreements under which Atos Nederland B.V. supplies goods and/or
 services of whatever nature, the Terms of Delivery from Atos Nederland B.V.
 exclusively apply. The Terms of Delivery shall be promptly submitted to you
 on your request.
 
  Atos Nederland B.V. / Utrecht
  KvK Utrecht 30132762
 
 
 
 
 
 
 
 
  Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel
 bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd,
 verzoeken wij u dit onmiddellijk aan ons te melden en het

Re: TestFairScheduler failing - version 0.20. security 204

2011-12-18 Thread Merto Mertek

I figured out that if I run the test in console with ant
test-fairscheduler (my modification of target test in
src/contrib/build.xml) all tests runs ok. If I understand this right
probably testing is always done with ant and test files are never triggered
in eclipse ide.

Because I am rather new to all of this I would like to hear from you how do
you develop a new feature and how you test it. In my situation I would do
it as follows:
- develop a new feature ( make some code modification)
- build the scheduler with ant
- write unit tests
- run tests class from ant
- deploy a new scheduler build/jar to a cluster
- try it on a working cluster

Is there any other option how to try a new functionality locally or in any
other way? Any comments and suggestion are welcomed
Thank you..




On 17 December 2011 21:58, Merto Mertek masmer...@gmail.com wrote:

 Hi,

 I am having some problems with running the following test file

 org.apache.hadoop.mapred.TestFairScheduler

 Nearly all test fails, most of them with the error:
 javalang.runtimeexception: COULD NOT START JT. Here is a 
 tracehttp://pastebin.com/Jx90sYbw
 .
 Code was checkout from the svn branch, then I run ant build and ant
 eclipse. Test was run inside eclipse.

 I would like to solve those problems before modifying the scheduler. Any
 hints appreciated. Probably just some config issue?

 Thank you

TestFairScheduler failing - version 0.20. security 204

2011-12-17 Thread Merto Mertek

Hi,

I am having some problems with running the following test file

org.apache.hadoop.mapred.TestFairScheduler

Nearly all test fails, most of them with the error:
javalang.runtimeexception: COULD NOT START JT. Here is a
tracehttp://pastebin.com/Jx90sYbw
.
Code was checkout from the svn branch, then I run ant build and ant
eclipse. Test was run inside eclipse.

I would like to solve those problems before modifying the scheduler. Any
hints appreciated. Probably just some config issue?

Thank you

Re: Environment consideration for a research on scheduling

2011-09-27 Thread Merto Mertek

Desktop edition was chosen just to run the namemode and to monitor cluster
statistics. Workernodes were chosen to run on ubuntu server edition because
we find this configuration in several research papers. One of such
configuration can be found in the paper for LATE scheduler (is maybe some
source code of this available or is integrated in the new fair scheduler?)

thanks for the provided tools..

On 26 September 2011 11:41, Steve Loughran ste...@apache.org wrote:

On 23/09/11 16:09, GOEKE, MATTHEW (AG/1000) wrote:

If you are starting from scratch with no prior Hadoop install experience I
would configure stand-alone, migrate to pseudo distributed and then to fully
distributed verifying functionality at each step by doing a simple word
count run. Also, if you don't mind using the CDH distribution then SCM /
their rpms will greatly simplify both the bin installs as well as the user
creation.

Your VM route will most likely work but I can imagine the amount of
hiccups during migration from that to the real cluster will not make it
worth your time.

Matt

-Original Message-
From: Merto Mertek [mailto:masmer...@gmail.com]
Sent: Friday, September 23, 2011 10:00 AM
To: common-user@hadoop.apache.org
Subject: Environment consideration for a research on scheduling

Hi,
in the first phase we are planning to establish a small cluster with few
commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
10.10 and a hadoop build from the branch 0.20.204 (i had some issues with
version 0.20.203 with missing
librarieshttp://hadoop-**common.472056.n3.nabble.com/**
Development-enviroment-**problems-eclipse-hadoop-0-20-**
203-td3186022.html#a3188567http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567
).
Would you suggest any other version?

I wouldn't run to put Ubuntu 10.x on; they make good desktops, but RHEL and
CentOS are the platform of choice in the server side.

In the second phase we are planning to analyse, test and modify some of
hadoop schedulers.

The main schedulers used by Y! and FB are fairly tuned for their workloads,
and not apparently something you'd want to play with. There is at least one
other scheduler in the contribs/ dir to play with.

the other thing about scheduling is that you may have a faster development
cycle if, instead of working on a real cluster, you simulate it and
multiples of real time; using stats collected from your own workload by way
of the gridmix2 tools. I've never done scheduling work, but think there's
some stuff there to do that. if not, it's a possible contribution.

Be aware that the changes in 0.23+ will change resource scheduling; this
may be a better place to do development with a plan to deploy in 2012. Oh,
and get on the mapreduce lists, esp, the -dev list, to discuss issues

The information contained in this email may be subject to the export
control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR)
and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this
information you are obligated to comply with all
applicable U.S. export laws and regulations.

I have no idea what that means but am not convinced that reading an email
forces me to comply with a different country's rules

Re: Environment consideration for a research on scheduling

2011-09-24 Thread Merto Mertek

I agree, we will go the standard route. Like you suggested we will go step
by step to the full cluster deployment. After the first node configuration
we will use clonezilla to replicate it and then setup them one by one..

On the workernodes I was thinking to run ubuntu server, namenode will run
ubuntu desktop. I am interested how should I configure the environment that
I will able to remotely monitor, analyse and configure the cluster. I will
run jobs outsite the local network via ssh to the namenode, however in this
situation I will not be abble to access the web interface of the job and
tasktracker. So I am wondering how to analyze them and how did you configure
your environment to be as practical as possible.

For monitoring the cluster I saw that ganglia is one of the option, but in
this stage of testing probably job-history files will be enough..

On 23 September 2011 17:09, GOEKE, MATTHEW (AG/1000)
matthew.go...@monsanto.com wrote:

Your VM route will most likely work but I can imagine the amount of hiccups
during migration from that to the real cluster will not make it worth your
time.

Matt

Hi,
in the first phase we are planning to establish a small cluster with few
commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
10.10 and a hadoop build from the branch 0.20.204 (i had some issues with
version 0.20.203 with missing
libraries
http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567
).
Would you suggest any other version?

In the second phase we are planning to analyse, test and modify some of
hadoop schedulers.

Now I am interested what is the best way to deploy ubuntu and hadop to this
few machine. I was thinking to configure the system in the local VM and
then
converting it to each physical machine but probably this is not the best
option. If you know any other please share..

Thanks you!
This e-mail message may contain privileged and/or confidential information,
and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error,
please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use
of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring,
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for
checking for the presence of Viruses or other Malware.
Monsanto, along with its subsidiaries, accepts no liability for any damage
caused by any such code transmitted by or accompanying
this e-mail or any attachment.

Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek

Hi,
i am receiving messages from two mailing lists (common-dev,common-user)
and I would like to disable receiving msg from jira. I am not a member of
common-issues-unsubscribe list. Can I anyhow disable this? Thank you

Re: Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek

Probably there is not any option just to disable jira issues.. I will
probably need the common-dev list so I will stay subscribed..

Thank you...

On 23 September 2011 16:11, Harsh J ha...@cloudera.com wrote:

 Merto,

 You need common-dev-unsubscribe@

 The common-dev list receives just JIRA opened/resolved/reopened
 messages. The common-issues receives everything.

 On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek masmer...@gmail.com wrote:
  Hi,
  i am receiving messages from two mailing lists
 (common-dev,common-user)
  and I would like to disable receiving msg from jira. I am not a member of
  common-issues-unsubscribe list. Can I anyhow disable this? Thank you
 



 --
 Harsh J

Re: Unsubscribe from jira issues

2011-09-23 Thread Merto Mertek

hehe :) you are right :)

On 23 September 2011 16:21, Harsh J ha...@cloudera.com wrote:

 Merto,

 Am sure your mail client has some form of filtering available in that case!
 :-)

 On Fri, Sep 23, 2011 at 7:49 PM, Merto Mertek masmer...@gmail.com wrote:
  Probably there is not any option just to disable jira issues.. I will
  probably need the common-dev list so I will stay subscribed..
 
  Thank you...
 
  On 23 September 2011 16:11, Harsh J ha...@cloudera.com wrote:
 
  Merto,
 
  You need common-dev-unsubscribe@
 
  The common-dev list receives just JIRA opened/resolved/reopened
  messages. The common-issues receives everything.
 
  On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek masmer...@gmail.com
 wrote:
   Hi,
   i am receiving messages from two mailing lists
  (common-dev,common-user)
   and I would like to disable receiving msg from jira. I am not a member
 of
   common-issues-unsubscribe list. Can I anyhow disable this? Thank you
  
 
 
 
  --
  Harsh J
 
 



 --
 Harsh J

Environment consideration for a research on scheduling

2011-09-23 Thread Merto Mertek

Hi,
in the first phase we are planning to establish a small cluster with few
commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
10.10 and  a hadoop build from the branch 0.20.204 (i had some issues with
version 0.20.203 with missing
librarieshttp://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567).
Would you suggest any other version?

In the second phase we are planning to analyse, test and modify some of
hadoop schedulers.

Now I am interested what is the best way to deploy ubuntu and hadop to this
few machine. I was thinking to configure the system in the local VM and then
converting it to each physical machine but probably this is not the best
option. If you know any other please share..

Thanks you!

Hadoop-on-demand and torque

Re: Distributing MapReduce on a computer cluster

Re: Algorithms used in fairscheduler 0.20.205

Algorithms used in fairscheduler 0.20.205

Fairscheduler - disable default pool

Re: Fairscheduler - disable default pool

Re: Hadoop fair scheduler doubt: allocate jobs to pool

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Tasktracker fails

Re: Tasktracker fails

Re: Dynamic changing of slaves

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Compile and deploy a new version of hadoop

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Configure hadoop scheduler

Re: Desperate!!!! Expanding,shrinking cluster or replacing failed nodes.

Re: TestFairScheduler failing - version 0.20. security 204

TestFairScheduler failing - version 0.20. security 204

Re: Environment consideration for a research on scheduling

Re: Environment consideration for a research on scheduling

Unsubscribe from jira issues

Re: Unsubscribe from jira issues

Re: Unsubscribe from jira issues

Environment consideration for a research on scheduling

29 matches

Site Navigation

Mail list logo

Footer information