Hadoop-on-demand and torque
If I understand it right HOD is mentioned mainly for merging existing HPC clusters with hadoop and for testing purposes.. I cannot find what is the role of Torque here (just initial nodes allocation?) and which is the default scheduler of HOD ? Probably the scheduler from the hadoop distribution? In the doc is mentioned a MAUI scheduler, but probably if there would be an integration with hadoop there will be any document on it.. thanks..
Re: Distributing MapReduce on a computer cluster
For distribution of load you can start reading some chapters from different types of hadoop scheduler. I have not yet studied other implementation like hadoop, however a very simplified version of distribution concept is the following: a) Tasktracker ask for work (heartbeat consist of a status of the worker node - # free slots) b) Jobtracker pick a job from a list which is sorted based on the specified policy (fairscheduling, fifo, lifo, other sla) c) Tasktracker executes the map/reduce job Like mentioned before there are a lot more details.. In b) there exists an implementation of delay scheduling which is there to improve throughput by taking account of input data location for a picked job. There you have a preemption mechanism that regulate the fairness between pools,etc.. A good start is book that Preshant mentioned... On 23 April 2012 23:49, Prashant Kommireddi prash1...@gmail.com wrote: Shailesh, there's a lot that goes into distributing work across tasks/nodes. It's not just distributing work but also fault-tolerance, data locality etc that come into play. It might be good to refer Hadoop apache docs or Tom White's definitive guide. Sent from my iPhone On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala shailesh2...@gmail.com wrote: Hello, I am trying to design my own MapReduce Implementation and I want to know how hadoop is able to distribute its workload across multiple computers. Can anyone shed more light on this? thanks!
Re: Algorithms used in fairscheduler 0.20.205
Anyone? On 19 April 2012 17:34, Merto Mertek masmer...@gmail.com wrote: I could find that the closest doc matching the current implementation of the fairscheduler could be find in this documenthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlfrom Matei Zaharia et al.. Another documented from delay scheduling can be found from year 2010.. a) I am interested if there maybe exist any newer documented version of the implementation? b) Are there any other algorithms in addition to delay scheduling, copy-compute splitting algorithm and fairshare calculation algorithm that are important for the cluster performance and fairsharing? c) Is there maybe any connection between copy-compute splitting and mapreduce phases (copy-sort-reduce)? Thank you..
Algorithms used in fairscheduler 0.20.205
I could find that the closest doc matching the current implementation of the fairscheduler could be find in this documenthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlfrom Matei Zaharia et al.. Another documented from delay scheduling can be found from year 2010.. a) I am interested if there maybe exist any newer documented version of the implementation? b) Are there any other algorithms in addition to delay scheduling, copy-compute splitting algorithm and fairshare calculation algorithm that are important for the cluster performance and fairsharing? c) Is there maybe any connection between copy-compute splitting and mapreduce phases (copy-sort-reduce)? Thank you..
Fairscheduler - disable default pool
I know that by design all unmarked jobs goes to that pool, however I am doing some testing and I am interested if is possible to disable it.. Thanks
Re: Fairscheduler - disable default pool
Thanks for your workaround, but I think that with this you just put a constraint on the pool that it will not accept any job. I am doing some calculation with weights and do not want the default pool weight to be included in the computation.. On 13 March 2012 18:52, Jean-Daniel Cryans jdcry...@apache.org wrote: We do it here by setting this: poolMaxJobsDefault0/poolMaxJobsDefault So that you _must_ have a pool (that's configured with a different maxRunningJobs) in order to run jobs. Hope this helps, J-D On Tue, Mar 13, 2012 at 10:49 AM, Merto Mertek masmer...@gmail.com wrote: I know that by design all unmarked jobs goes to that pool, however I am doing some testing and I am interested if is possible to disable it.. Thanks
Re: Hadoop fair scheduler doubt: allocate jobs to pool
From the fairscheduler docs I assume the following should work: property namemapred.fairscheduler.poolnameproperty/name valuepool.name/value /property property namepool.name/name value${mapreduce.job.group.name}/value /property which means that the default pool will be the group of the user that has submitted the job. In your case I think that allocations.xml is correct. If you want to explicitly define a job to specific pool from your allocation.xml file you can define it as follows: Configuration conf3 = conf; conf3.set(pool.name, pool3); // conf.set(propriety.name, value) Let me know if it works.. On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote: How can I set the fair scheduler such that all jobs submitted from a particular user group go to a pool with the group name? I have setup fair scheduler and I have two users: A and B (belonging to the user group hadoop) When these users submit hadoop jobs, the jobs from A got to a pool named A and the jobs from B go to a pool named B. I want them to go to a pool with their group name, So I tried adding the following to mapred-site.xml: property namemapred.fairscheduler.poolnameproperty/name valuegroup.name/value /property But instead the jobs now go to the default pool. I want the jobs submitted by A and B to go to the pool named hadoop. How do I do that? also how can I explicity set a job to any specified pool? I have set the allocation file (fair-scheduler.xml) like this: allocations pool name=hadoop minMaps1/minMaps minReduces1/minReduces maxMaps3/maxMaps maxReduces3/maxReduces /pool userMaxJobsDefault5/userMaxJobsDefault /allocations Any help is greatly appreciated. Thanks, Austin
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Varun sorry for my late response. Today I have deployed a new version and I can confirm that patches you provided works well. I' ve been running some jobs on a 5node cluster for an hour without a core on full load so now thinks works as expected. Thank you again! I have used just your first option.. On 15 February 2012 19:53, mete efk...@gmail.com wrote: Well rebuilding ganglia seemed easier and Merto was testing the other so i though that i should give that one a chance :) anyway i will send you gdb details or patch hadoop and try it at my earliest convenience Cheers On Wed, Feb 15, 2012 at 6:59 PM, Varun Kapoor rez...@hortonworks.com wrote: The warnings about underflow are totally expected (they come from strtod(), and they will no longer occur with Hadoop-1.0.1, which applies my patch from HADOOP-8052), so that's not worrisome. As for the buffer overflow, do you think you could show me a backtrace of this core? If you can't find the core file on disk, just start gmetad under gdb, like so: $ sudo gdb path to gmetad (gdb) r --conf=path to your gmetad.conf ... ::Wait for crash:: (gdb) bt (gdb) info locals If you're familiar with gdb, then I'd appreciate any additional diagnosis you could perform (for example, to figure out which metric's value caused this buffer overflow) - if you're not, I'll try and send you some gdb scripts to narrow things down once I see the output from this round of debugging. Also, out of curiosity, is patching Hadoop not an option for you? Or is it just that rebuilding (and redeploying) ganglia is the lesser of the 2 evils? :) Varun On Tue, Feb 14, 2012 at 11:43 PM, mete efk...@gmail.com wrote: Hello Varun, i have patched and recompiled ganglia from source bit it still cores after the patch. Here are some logs: Feb 15 09:39:14 master gmetad[16487]: RRD_update (/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd): /var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd: converting '4.9E-324' to float: Numerical result out of range Feb 15 09:39:14 master gmetad[16487]: RRD_update (/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd): /var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd: converting '4.9E-324' to float: Numerical result out of range Feb 15 09:39:14 master gmetad[16487]: RRD_update (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd): /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd: converting '4.9E-324' to float: Numerical result out of range Feb 15 09:39:14 master gmetad[16487]: RRD_update (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd): /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd: converting '4.9E-324' to float: Numerical result out of range Feb 15 09:39:14 master gmetad[16487]: RRD_update (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd): /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd: converting '4.9E-324' to float: Numerical result out of range Feb 15 09:39:14 master gmetad[16487]: *** buffer overflow detected ***: gmetad terminated i am using hadoop.1.0.0 and ganglia 3.20 tarball. Cheers Mete On Sat, Feb 11, 2012 at 2:19 AM, Merto Mertek masmer...@gmail.com wrote: Varun unfortunately I have had some problems with deploying a new version on the cluster.. Hadoop is not picking the new build in lib folder despite a classpath is set to it. The new build is picked just if I put it in the $HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes and can access the web, but all tasktracker are being stopped because of an error: INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup... java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926) Probably the error is the consequence of an inadequate deploy of a jar.. I will ask to the dev list how they do it or are you maybe having any other idea? On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com wrote: Hey Merto, Any luck getting the patch running on your cluster? In case you're interested, there's now a JIRA for this: https://issues.apache.org/jira/browse/HADOOP-8052. Varun On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor rez
Re: Tasktracker fails
Hm.. I would try first to stop all the deamons wtih $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one of the slaves no deamons are running (jps). Maybe you could try to check if your conf on tasktrackers for the jobtracker is pointing to the right place (mapred-site.xml). Do you see any error in the jobtracker log too? On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote: Any update on the below issue. Thanks Adarsh Sharma wrote: Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname resolution commands in all nodes. 2012-02-21 17:48:33,056 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker: Can not start task tracker because java.net.SocketException: Invalid argument at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.**ServerSocketChannelImpl.bind(** ServerSocketChannelImpl.java:**119) at sun.nio.ch.**ServerSocketAdaptor.bind(** ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.**bind(Server.java:225) at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:** 301) at org.apache.hadoop.ipc.Server.**init(Server.java:1483) at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506) at org.apache.hadoop.mapred.**TaskTracker.initialize(** TaskTracker.java:772) at org.apache.hadoop.mapred.**TaskTracker.init(** TaskTracker.java:1428) at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.** java:3673) Any comments on the issue. Thanks
Re: Tasktracker fails
I do not know how the distribution and splitting of deflate files exactly works if that is your question but probably you will find something useful in *Codec classes, where are located implementations of few compression formats. Deflate files are just a type of compression files that you can use for storing files in your system. There are several others types, depending on your needs and tradeofs you are dealing (space or time for compressing). Globs I think are just a matching strategy to match files/folders together with regular expressions.. On 22 February 2012 19:29, Jay Vyas jayunit...@gmail.com wrote: Hi guys ! Im trying to understand the way globstatus / deflate files work in hdfs. I cant read them using the globStatus API in the hadoop FileSystem , from java. the specifics are here if anyone wants some easy stackoverflow points :) http://stackoverflow.com/questions/9400739/hadoop-globstatus-and-deflate-files On Wed, Feb 22, 2012 at 7:39 AM, Merto Mertek masmer...@gmail.com wrote: Hm.. I would try first to stop all the deamons wtih $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one of the slaves no deamons are running (jps). Maybe you could try to check if your conf on tasktrackers for the jobtracker is pointing to the right place (mapred-site.xml). Do you see any error in the jobtracker log too? On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote: Any update on the below issue. Thanks Adarsh Sharma wrote: Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname resolution commands in all nodes. 2012-02-21 17:48:33,056 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker: Can not start task tracker because java.net.SocketException: Invalid argument at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.**ServerSocketChannelImpl.bind(** ServerSocketChannelImpl.java:**119) at sun.nio.ch.**ServerSocketAdaptor.bind(** ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.**bind(Server.java:225) at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:** 301) at org.apache.hadoop.ipc.Server.**init(Server.java:1483) at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506) at org.apache.hadoop.mapred.**TaskTracker.initialize(** TaskTracker.java:772) at org.apache.hadoop.mapred.**TaskTracker.init(** TaskTracker.java:1428) at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.** java:3673) Any comments on the issue. Thanks -- Jay Vyas MMSB/UCHC
Re: Dynamic changing of slaves
I think that job configuration does not allow you such setup, however maybe I missed something.. Probably I would tackle this problem from the scheduler source. The default one is JobQueueTaskScheduler which preserves a fifo based queue. When a tasktracker (your slave) tells the jobtracker that it has some free slots to run, JT in the heartbeat method calls the scheduler assignTasks method where tasks are assigned on local basis. In other words, scheduler tries to find tasks on the tasktracker which data resides on it. If the scheduler will not find a local map/reduce task to run it will try to find a non local one. Probably here is the point where you should do something with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting for the TT heartbeat, maybe there is another option to force an heartbeatResponse, despite the TT has not send a heartbeat but I am not aware of it.. On 21 February 2012 19:27, theta glynisdso...@email.arizona.edu wrote: Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Varun unfortunately I have had some problems with deploying a new version on the cluster.. Hadoop is not picking the new build in lib folder despite a classpath is set to it. The new build is picked just if I put it in the $HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes and can access the web, but all tasktracker are being stopped because of an error: INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup... java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926) Probably the error is the consequence of an inadequate deploy of a jar.. I will ask to the dev list how they do it or are you maybe having any other idea? On 10 February 2012 17:10, Varun Kapoor rez...@hortonworks.com wrote: Hey Merto, Any luck getting the patch running on your cluster? In case you're interested, there's now a JIRA for this: https://issues.apache.org/jira/browse/HADOOP-8052. Varun On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor rez...@hortonworks.com wrote: Your general procedure sounds correct (i.e. dropping your newly built .jar into $HD_HOME/lib/), but to make sure it's getting picked up, you should explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH environment variable; here's mine, as an example: export HADOOP_CLASSPATH=.:./build/*.jar About your second point, you certainly need to copy this newly patched .jar to every node in your cluster, because my patch changes the value of a couple metrics emitted TO gmetad (FROM all the nodes in the cluster), so without copying it over to every node in the cluster, gmetad will still likely receive some bad metrics. Varun On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek masmer...@gmail.com wrote: I will need your help. Please confirm if the following procedure is right. I have a dev environment where I pimp my scheduler (no hadoop running) and a small cluster environment where the changes(jars) are deployed with some scripts, however I have never compiled the whole hadoop from source so I do not know if I am doing it right. I' ve done it as follow: a) apply a patch b) cd $HD_HOME; ant c) copy $HD_HOME/*build*/patched-core-hadoop.jar - cluster:/$HD_HOME/*lib* d) run $HD_HOME/bin/start-all.sh Is this enough? When I tried to test hadoop dfs -ls / I could see that a new jar was not loaded and instead a jar from $HD_HOME/*share*/hadoop-20.205.0.jar was taken.. Should I copy the entire hadoop folder to all nodes and reconfigure the entire cluster for the new build, or is enough if I configure it just on the node where gmetad will run? On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com wrote: I'm so sorry, Merto - like a silly goose, I attached the 2 patches to my reply, and of course the mailing list did not accept the attachment. I plan on opening JIRAs for this tomorrow, but till then, here are links to the 2 patches (from my Dropbox account): - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch Here's hoping this works for you, Varun On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek masmer...@gmail.com wrote: Varun, have I missed your link to the patches? I have tried to search them on jira but I did not find them.. Can you repost the link for these two patches? Thank you.. On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote: I'm sorry to hear that gmetad cores continuously for you guys. Since I'm not seeing that behavior, I'm going to just put out the 2 possible patches you could apply and wait to hear back from you. :) Option 1 * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup ) in your Hadoop sources and rebuild Hadoop. Option 2 * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and rebuild gmetad. Only 1 of these 2 fixes is required, and it would help me if you could first try Option 1 and let me know if that fixes things for you. Varun On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote: Same with Merto's situation here, it always overflows short time after the restart. Without the hadoop metrics enabled everything is smooth. Regards Mete On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com wrote: I have tried to run it but it repeats crashing.. - When you start gmetad
Compile and deploy a new version of hadoop
I am having some troubles in understanding how the whole stuff works.. Compiling with ant works ok and I am able to compile a jar which is afterwards deployed to the cluster. On the cluster I've set the HADOOP_CLASSPATH variable to point just to jar files in the lib folder ($HD_HOME/lib/*.jar), where I put the new compiled hadoop-core-myversion.jar. Before deploying I guarantee that in the $HD_HOME folder and $HD_HOME/lib there are no previous version of hadoop-core-xxx.jar or core-3.3.1.jar . The problem is that I suspect that hadoop is picking the wrong hadoop-core jars so I am interested how the whole mechanism works and what is the purpose of the $HD_HOME/share/hadoop folder where I can locate other hadoop-core jars and which is included in the classpath in hadoop-env.sh? My last question is what is the easiest way to see that your build is up and running? Maybe from the release tag in JT? Thanks you..
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
I will need your help. Please confirm if the following procedure is right. I have a dev environment where I pimp my scheduler (no hadoop running) and a small cluster environment where the changes(jars) are deployed with some scripts, however I have never compiled the whole hadoop from source so I do not know if I am doing it right. I' ve done it as follow: a) apply a patch b) cd $HD_HOME; ant c) copy $HD_HOME/*build*/patched-core-hadoop.jar - cluster:/$HD_HOME/*lib* d) run $HD_HOME/bin/start-all.sh Is this enough? When I tried to test hadoop dfs -ls / I could see that a new jar was not loaded and instead a jar from $HD_HOME/*share*/hadoop-20.205.0.jar was taken.. Should I copy the entire hadoop folder to all nodes and reconfigure the entire cluster for the new build, or is enough if I configure it just on the node where gmetad will run? On 8 February 2012 06:33, Varun Kapoor rez...@hortonworks.com wrote: I'm so sorry, Merto - like a silly goose, I attached the 2 patches to my reply, and of course the mailing list did not accept the attachment. I plan on opening JIRAs for this tomorrow, but till then, here are links to the 2 patches (from my Dropbox account): - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch Here's hoping this works for you, Varun On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek masmer...@gmail.com wrote: Varun, have I missed your link to the patches? I have tried to search them on jira but I did not find them.. Can you repost the link for these two patches? Thank you.. On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote: I'm sorry to hear that gmetad cores continuously for you guys. Since I'm not seeing that behavior, I'm going to just put out the 2 possible patches you could apply and wait to hear back from you. :) Option 1 * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup) in your Hadoop sources and rebuild Hadoop. Option 2 * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and rebuild gmetad. Only 1 of these 2 fixes is required, and it would help me if you could first try Option 1 and let me know if that fixes things for you. Varun On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote: Same with Merto's situation here, it always overflows short time after the restart. Without the hadoop metrics enabled everything is smooth. Regards Mete On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com wrote: I have tried to run it but it repeats crashing.. - When you start gmetad and Hadoop is not emitting metrics, everything is peachy. Right, running just ganglia without running hadoop jobs seems stable for at least a day.. - When you start Hadoop (and it thus starts emitting metrics), gmetad cores. True, with a following error : *** stack smashing detected ***: gmetad terminated \n Segmentation fault - On my MacBookPro, it's a SIGABRT due to a buffer overflow. I believe this is happening for everyone. What I would like for you to try out are the following 2 scenarios: - Once gmetad cores, if you start it up again, does it core again? Does this process repeat ad infinitum? - On my MBP, the core is a one-time thing, and restarting gmetad after the first core makes things run perfectly smoothly. - I know others are saying this core occurs continuously, but they were all using ganglia-3.1.x, and I'm interested in how ganglia-3.2.0 behaves for you. It cores everytime I run it. The difference is just that sometimes a segmentation faults appears instantly, and sometimes it appears after a random time...lets say after a minute of running gmetad and collecting data. - If you start Hadoop first (so gmetad is not running when the first batch of Hadoop metrics are emitted) and THEN start gmetad after a few seconds, do you still see gmetad coring? Yes - On my MBP, this sequence works perfectly fine, and there are no gmetad cores whatsoever. I have tested this scenario with 2 working nodes so two gmond plus the head gmond on the server where gmetad is located. I have checked and all of them are versioned 3.2.0. Hope it helps.. Bear in mind that this only addresses the gmetad coring issue - the warnings emitted about '4.9E-324' being out of range will continue, but I know what's causing
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Varun, have I missed your link to the patches? I have tried to search them on jira but I did not find them.. Can you repost the link for these two patches? Thank you.. On 7 February 2012 20:36, Varun Kapoor rez...@hortonworks.com wrote: I'm sorry to hear that gmetad cores continuously for you guys. Since I'm not seeing that behavior, I'm going to just put out the 2 possible patches you could apply and wait to hear back from you. :) Option 1 * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupin my setup) in your Hadoop sources and rebuild Hadoop. Option 2 * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c and rebuild gmetad. Only 1 of these 2 fixes is required, and it would help me if you could first try Option 1 and let me know if that fixes things for you. Varun On Mon, Feb 6, 2012 at 10:36 PM, mete efk...@gmail.com wrote: Same with Merto's situation here, it always overflows short time after the restart. Without the hadoop metrics enabled everything is smooth. Regards Mete On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek masmer...@gmail.com wrote: I have tried to run it but it repeats crashing.. - When you start gmetad and Hadoop is not emitting metrics, everything is peachy. Right, running just ganglia without running hadoop jobs seems stable for at least a day.. - When you start Hadoop (and it thus starts emitting metrics), gmetad cores. True, with a following error : *** stack smashing detected ***: gmetad terminated \n Segmentation fault - On my MacBookPro, it's a SIGABRT due to a buffer overflow. I believe this is happening for everyone. What I would like for you to try out are the following 2 scenarios: - Once gmetad cores, if you start it up again, does it core again? Does this process repeat ad infinitum? - On my MBP, the core is a one-time thing, and restarting gmetad after the first core makes things run perfectly smoothly. - I know others are saying this core occurs continuously, but they were all using ganglia-3.1.x, and I'm interested in how ganglia-3.2.0 behaves for you. It cores everytime I run it. The difference is just that sometimes a segmentation faults appears instantly, and sometimes it appears after a random time...lets say after a minute of running gmetad and collecting data. - If you start Hadoop first (so gmetad is not running when the first batch of Hadoop metrics are emitted) and THEN start gmetad after a few seconds, do you still see gmetad coring? Yes - On my MBP, this sequence works perfectly fine, and there are no gmetad cores whatsoever. I have tested this scenario with 2 working nodes so two gmond plus the head gmond on the server where gmetad is located. I have checked and all of them are versioned 3.2.0. Hope it helps.. Bear in mind that this only addresses the gmetad coring issue - the warnings emitted about '4.9E-324' being out of range will continue, but I know what's causing that as well (and hope that my patch fixes it for free). Varun On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek masmer...@gmail.com wrote: Yes I am encoutering the same problems and like Mete said few seconds after restarting a segmentation fault appears.. here is my conf.. http://pastebin.com/VgBjp08d And here are some info from /var/log/messages (ubuntu server 10.10): kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000] When I compiled gmetad I used the following command: ./configure --with-gmetad --sysconfdir=/etc/ganglia CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include CFLAGS=-I/usr/local/rrdtool-1.4.7/include LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib The same was tried with rrdtool 1.4.5. My current ganglia version is 3.2.0 and like Mete I tried it with version 3.1.7 but without success.. Hope we will sort it out soon any solution.. thank you On 6 February 2012 20:09, mete efk...@gmail.com wrote: Hello, i also face this issue when using GangliaContext31 and hadoop-1.0.0, and ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as soon as i restart the gmetad. Regards Mete On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate gog...@hortonworks.com wrote: I assume you have seen the following information on Hadoop twiki, http://wiki.apache.org/hadoop/GangliaMetrics So do you use GangliaContext31 in hadoop-metrics2.properties? We use Ganglia 3.2 with Hadoop
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Yes I am encoutering the same problems and like Mete said few seconds after restarting a segmentation fault appears.. here is my conf.. http://pastebin.com/VgBjp08d And here are some info from /var/log/messages (ubuntu server 10.10): kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000] When I compiled gmetad I used the following command: ./configure --with-gmetad --sysconfdir=/etc/ganglia CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include CFLAGS=-I/usr/local/rrdtool-1.4.7/include LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib The same was tried with rrdtool 1.4.5. My current ganglia version is 3.2.0 and like Mete I tried it with version 3.1.7 but without success.. Hope we will sort it out soon any solution.. thank you On 6 February 2012 20:09, mete efk...@gmail.com wrote: Hello, i also face this issue when using GangliaContext31 and hadoop-1.0.0, and ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as soon as i restart the gmetad. Regards Mete On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate gog...@hortonworks.com wrote: I assume you have seen the following information on Hadoop twiki, http://wiki.apache.org/hadoop/GangliaMetrics So do you use GangliaContext31 in hadoop-metrics2.properties? We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember seeing gmetad sometime goes down due to buffer overflow problem when hadoop starts pumping in the metrics.. but restarting works.. let me know if you face same problem? --Suhas Additionally, the Ganglia protocol change significantly between Ganglia 3.0 and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0 clients). This caused Hadoop to not work with Ganglia 3.1; there is a patch available for this, HADOOP-4675. As of November 2010, this patch has been rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1 protocol in place of the 3.0, substitute org.apache.hadoop.metrics.ganglia.GangliaContext31 for org.apache.hadoop.metrics.ganglia.GangliaContext in the hadoop-metrics.properties lines above. On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek masmer...@gmail.com wrote: I spent a lot of time to figure it out however i did not find a solution. Problems from the logs pointed me for some bugs in rrdupdate tool, however i tried to solve it with different versions of ganglia and rrdtool but the error is the same. Segmentation fault appears after the following lines, if I run gmetad in debug mode... Created rrd /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd Created rrd /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd which I suppose are generated from MetricsSystemImpl.java (Is there any way just to disable this two metrics?) From the /var/log/messages there are a lot of errors: xxx gmetad[15217]: RRD_update (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd): converting '4.9E-324' to float: Numerical result out of range xxx gmetad[15217]: RRD_update (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd): converting '4.9E-324' to float: Numerical result out of range so probably there are some converting issues ? Where should I look for the solution? Would you rather suggest to use ganglia 3.0.x with the old protocol and leave the version 3.1 for further releases? any help is realy appreciated... On 1 February 2012 04:04, Merto Mertek masmer...@gmail.com wrote: I would be glad to hear that too.. I've setup the following: Hadoop 0.20.205 Ganglia Front 3.1.7 Ganglia Back *(gmetad)* 3.1.7 RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles installing 1.4.4 Ganglia works just in case hadoop is not running, so metrics are not publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When hadoop is started, a segmentation fault appears in gmetad deamon: sudo gmetad -d 2 ... Updating host xxx, metric dfs.FSNamesystem.BlocksTotal Updating host xxx, metric bytes_in Updating host xxx, metric bytes_out Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time Created rrd /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd Segmentation fault And some info from the apache log http://pastebin.com/nrqKRtKJ.. Can someone suggest a ganglia version that is tested with hadoop 0.20.205? I will try to sort it out however it seems a not so tribial problem.. Thank you On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote: or Do I have to apply some hadoop patch
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
I have tried to run it but it repeats crashing.. - When you start gmetad and Hadoop is not emitting metrics, everything is peachy. Right, running just ganglia without running hadoop jobs seems stable for at least a day.. - When you start Hadoop (and it thus starts emitting metrics), gmetad cores. True, with a following error : *** stack smashing detected ***: gmetad terminated \n Segmentation fault - On my MacBookPro, it's a SIGABRT due to a buffer overflow. I believe this is happening for everyone. What I would like for you to try out are the following 2 scenarios: - Once gmetad cores, if you start it up again, does it core again? Does this process repeat ad infinitum? - On my MBP, the core is a one-time thing, and restarting gmetad after the first core makes things run perfectly smoothly. - I know others are saying this core occurs continuously, but they were all using ganglia-3.1.x, and I'm interested in how ganglia-3.2.0 behaves for you. It cores everytime I run it. The difference is just that sometimes a segmentation faults appears instantly, and sometimes it appears after a random time...lets say after a minute of running gmetad and collecting data. - If you start Hadoop first (so gmetad is not running when the first batch of Hadoop metrics are emitted) and THEN start gmetad after a few seconds, do you still see gmetad coring? Yes - On my MBP, this sequence works perfectly fine, and there are no gmetad cores whatsoever. I have tested this scenario with 2 working nodes so two gmond plus the head gmond on the server where gmetad is located. I have checked and all of them are versioned 3.2.0. Hope it helps.. Bear in mind that this only addresses the gmetad coring issue - the warnings emitted about '4.9E-324' being out of range will continue, but I know what's causing that as well (and hope that my patch fixes it for free). Varun On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek masmer...@gmail.com wrote: Yes I am encoutering the same problems and like Mete said few seconds after restarting a segmentation fault appears.. here is my conf.. http://pastebin.com/VgBjp08d And here are some info from /var/log/messages (ubuntu server 10.10): kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000] When I compiled gmetad I used the following command: ./configure --with-gmetad --sysconfdir=/etc/ganglia CPPFLAGS=-I/usr/local/rrdtool-1.4.7/include CFLAGS=-I/usr/local/rrdtool-1.4.7/include LDFLAGS=-L/usr/local/rrdtool-1.4.7/lib The same was tried with rrdtool 1.4.5. My current ganglia version is 3.2.0 and like Mete I tried it with version 3.1.7 but without success.. Hope we will sort it out soon any solution.. thank you On 6 February 2012 20:09, mete efk...@gmail.com wrote: Hello, i also face this issue when using GangliaContext31 and hadoop-1.0.0, and ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as soon as i restart the gmetad. Regards Mete On Mon, Feb 6, 2012 at 7:42 PM, Vitthal Suhas Gogate gog...@hortonworks.com wrote: I assume you have seen the following information on Hadoop twiki, http://wiki.apache.org/hadoop/GangliaMetrics So do you use GangliaContext31 in hadoop-metrics2.properties? We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember seeing gmetad sometime goes down due to buffer overflow problem when hadoop starts pumping in the metrics.. but restarting works.. let me know if you face same problem? --Suhas Additionally, the Ganglia protocol change significantly between Ganglia 3.0 and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0 clients). This caused Hadoop to not work with Ganglia 3.1; there is a patch available for this, HADOOP-4675. As of November 2010, this patch has been rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1 protocol in place of the 3.0, substitute org.apache.hadoop.metrics.ganglia.GangliaContext31 for org.apache.hadoop.metrics.ganglia.GangliaContext in the hadoop-metrics.properties lines above. On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek masmer...@gmail.com wrote: I spent a lot of time to figure it out however i did not find a solution. Problems from the logs pointed me for some bugs in rrdupdate tool, however i tried to solve it with different versions of ganglia and rrdtool but the error is the same. Segmentation fault appears after the following lines, if I run gmetad in debug mode... Created rrd /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd Created rrd /var/lib/ganglia
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
I spent a lot of time to figure it out however i did not find a solution. Problems from the logs pointed me for some bugs in rrdupdate tool, however i tried to solve it with different versions of ganglia and rrdtool but the error is the same. Segmentation fault appears after the following lines, if I run gmetad in debug mode... Created rrd /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd Created rrd /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd which I suppose are generated from MetricsSystemImpl.java (Is there any way just to disable this two metrics?) From the /var/log/messages there are a lot of errors: xxx gmetad[15217]: RRD_update (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd): converting '4.9E-324' to float: Numerical result out of range xxx gmetad[15217]: RRD_update (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd): converting '4.9E-324' to float: Numerical result out of range so probably there are some converting issues ? Where should I look for the solution? Would you rather suggest to use ganglia 3.0.x with the old protocol and leave the version 3.1 for further releases? any help is realy appreciated... On 1 February 2012 04:04, Merto Mertek masmer...@gmail.com wrote: I would be glad to hear that too.. I've setup the following: Hadoop 0.20.205 Ganglia Front 3.1.7 Ganglia Back *(gmetad)* 3.1.7 RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles installing 1.4.4 Ganglia works just in case hadoop is not running, so metrics are not publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When hadoop is started, a segmentation fault appears in gmetad deamon: sudo gmetad -d 2 ... Updating host xxx, metric dfs.FSNamesystem.BlocksTotal Updating host xxx, metric bytes_in Updating host xxx, metric bytes_out Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time Created rrd /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd Segmentation fault And some info from the apache log http://pastebin.com/nrqKRtKJ.. Can someone suggest a ganglia version that is tested with hadoop 0.20.205? I will try to sort it out however it seems a not so tribial problem.. Thank you On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote: or Do I have to apply some hadoop patch for this ? Thanks, Praveenesh
Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
I would be glad to hear that too.. I've setup the following: Hadoop 0.20.205 Ganglia Front 3.1.7 Ganglia Back *(gmetad)* 3.1.7 RRDTool http://www.rrdtool.org/ 1.4.5. - i had some troubles installing 1.4.4 Ganglia works just in case hadoop is not running, so metrics are not publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When hadoop is started, a segmentation fault appears in gmetad deamon: sudo gmetad -d 2 ... Updating host xxx, metric dfs.FSNamesystem.BlocksTotal Updating host xxx, metric bytes_in Updating host xxx, metric bytes_out Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time Created rrd /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd Segmentation fault And some info from the apache log http://pastebin.com/nrqKRtKJ.. Can someone suggest a ganglia version that is tested with hadoop 0.20.205? I will try to sort it out however it seems a not so tribial problem.. Thank you On 2 December 2011 12:32, praveenesh kumar praveen...@gmail.com wrote: or Do I have to apply some hadoop patch for this ? Thanks, Praveenesh
Configure hadoop scheduler
Hi, I am having problems with changing the default hadoop scheduler (i assume that the default scheduler is a FIFO scheduler). I am following the guide located in hadoop/docs directory however I am not able to run it. Link for scheduling administration returns an http error 404 ( http://localhost:50030/scheduler ). In the UI under scheduling information I can see only one queue named default. mapred-site.xml file is accessible because when changing a port for a jobtracker I can see a daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added to .bashrc, however that did not solve the problem. I tried to rebuild hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the hadoop classpath in hadoop-env.sh to point to the lib folder, but without success. The only info of the scheduler that is seen in the jobtracker log is the folowing info: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) I am working on this several days and running out of ideas... I am wondering how to fix it and where to check currently active scheduler parameters? Config files: mapred-site.xml http://pastebin.com/HmDfWqE1 allocation.xml http://pastebin.com/Uexq7uHV Tried versions: 0.20.203 and 204 Thank you
Re: Desperate!!!! Expanding,shrinking cluster or replacing failed nodes.
I followed the same tutorial as you. If I am not wrong the problem arise because you first tried to run a node as single node and then joining it to the cluster (like Arpit mentioned). After testing that the new node works ok try to delete content in directory /app/hadoop/tmp/ and insert a new node to the cluster.When I setup config files on the new node I followed the following procedure: DATANODE setup config files (look the tutorial) /usr/local/hadoop/bin/hadoop-daemon.sh start datanode /usr/local/hadoop/bin/hadoop-daemon.sh start tasktracker --- MASTER $hdbin/hadoop dfsadmin -report nano /usr/local/hadoop/conf/slaves (add a new node) $hdbin/hadoop dfsadmin -refreshNodes $hdbin/hadoop namenode restart $hdbin/hadoop jobtracker restart ($hdbin/hadoop balancer ) ($hdbin/hadoop dfsadmin -report ) Hope it helps.. On 20 December 2011 18:38, Arpit Gupta ar...@hortonworks.com wrote: On the new nodes you are trying to add make sure the dfs/data directories are empty. You probably have a VERSION file from an older deploy and thus causing the incompatible namespaceId error. -- Arpit ar...@hortonworks.com On Dec 20, 2011, at 5:35 AM, Sloot, Hans-Peter wrote: But I ran into the : java.io.IOException: Incompatible namespaceIDs error every time. Should I config the files : dfs/data/current/VERSION and dfs/name/current/VERSION and conf/*site.xml from other existing nodes? -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: dinsdag 20 december 2011 14:30 To: common-user@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org Subject: Re: Desperate Expanding,shrinking cluster or replacing failed nodes. Hans-Peter, Adding new nodes is simply (assuming network setup is sane and done): - Install/unpack services on new machine. - Deploy a config copy for the services. - Start the services. You should *not* format a NameNode *ever*, after the first time you start it up. Formatting loses all data of HDFS, so don't even think about that after the first time you use it :) On 20-Dec-2011, at 6:12 PM, Sloot, Hans-Peter wrote: Hello all, I have asked this question a couple of days ago but no one responded. I built a 6 node hadoop cluster, guided Michael Noll, starting with a single node and expanding it one by one. Every time I expanded the cluster I ran into error : java.io.IOException: Incompatible namespaceIDs So now my question is what is the correct procedure for expanding, shrinking a cluster? And how to replace a failed node? Can someone point me to the correct manuals. I have already looked at the available documents on the wiki and hadoop.apache.org but could not find the answers. Regards Hans-Peter Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien de integriteit van het bericht niet veilig gesteld is middels verzending via internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos toegezonden. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Nederland B.V. group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. Atos Nederland B.V. / Utrecht KvK Utrecht 30132762 Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij u dit onmiddellijk aan ons te melden en het
Re: TestFairScheduler failing - version 0.20. security 204
I figured out that if I run the test in console with ant test-fairscheduler (my modification of target test in src/contrib/build.xml) all tests runs ok. If I understand this right probably testing is always done with ant and test files are never triggered in eclipse ide. Because I am rather new to all of this I would like to hear from you how do you develop a new feature and how you test it. In my situation I would do it as follows: - develop a new feature ( make some code modification) - build the scheduler with ant - write unit tests - run tests class from ant - deploy a new scheduler build/jar to a cluster - try it on a working cluster Is there any other option how to try a new functionality locally or in any other way? Any comments and suggestion are welcomed Thank you.. On 17 December 2011 21:58, Merto Mertek masmer...@gmail.com wrote: Hi, I am having some problems with running the following test file org.apache.hadoop.mapred.TestFairScheduler Nearly all test fails, most of them with the error: javalang.runtimeexception: COULD NOT START JT. Here is a tracehttp://pastebin.com/Jx90sYbw . Code was checkout from the svn branch, then I run ant build and ant eclipse. Test was run inside eclipse. I would like to solve those problems before modifying the scheduler. Any hints appreciated. Probably just some config issue? Thank you
TestFairScheduler failing - version 0.20. security 204
Hi, I am having some problems with running the following test file org.apache.hadoop.mapred.TestFairScheduler Nearly all test fails, most of them with the error: javalang.runtimeexception: COULD NOT START JT. Here is a tracehttp://pastebin.com/Jx90sYbw . Code was checkout from the svn branch, then I run ant build and ant eclipse. Test was run inside eclipse. I would like to solve those problems before modifying the scheduler. Any hints appreciated. Probably just some config issue? Thank you
Re: Environment consideration for a research on scheduling
Desktop edition was chosen just to run the namemode and to monitor cluster statistics. Workernodes were chosen to run on ubuntu server edition because we find this configuration in several research papers. One of such configuration can be found in the paper for LATE scheduler (is maybe some source code of this available or is integrated in the new fair scheduler?) thanks for the provided tools.. On 26 September 2011 11:41, Steve Loughran ste...@apache.org wrote: On 23/09/11 16:09, GOEKE, MATTHEW (AG/1000) wrote: If you are starting from scratch with no prior Hadoop install experience I would configure stand-alone, migrate to pseudo distributed and then to fully distributed verifying functionality at each step by doing a simple word count run. Also, if you don't mind using the CDH distribution then SCM / their rpms will greatly simplify both the bin installs as well as the user creation. Your VM route will most likely work but I can imagine the amount of hiccups during migration from that to the real cluster will not make it worth your time. Matt -Original Message- From: Merto Mertek [mailto:masmer...@gmail.com] Sent: Friday, September 23, 2011 10:00 AM To: common-user@hadoop.apache.org Subject: Environment consideration for a research on scheduling Hi, in the first phase we are planning to establish a small cluster with few commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server 10.10 and a hadoop build from the branch 0.20.204 (i had some issues with version 0.20.203 with missing librarieshttp://hadoop-**common.472056.n3.nabble.com/** Development-enviroment-**problems-eclipse-hadoop-0-20-** 203-td3186022.html#a3188567http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567 ). Would you suggest any other version? I wouldn't run to put Ubuntu 10.x on; they make good desktops, but RHEL and CentOS are the platform of choice in the server side. In the second phase we are planning to analyse, test and modify some of hadoop schedulers. The main schedulers used by Y! and FB are fairly tuned for their workloads, and not apparently something you'd want to play with. There is at least one other scheduler in the contribs/ dir to play with. the other thing about scheduling is that you may have a faster development cycle if, instead of working on a real cluster, you simulate it and multiples of real time; using stats collected from your own workload by way of the gridmix2 tools. I've never done scheduling work, but think there's some stuff there to do that. if not, it's a possible contribution. Be aware that the changes in 0.23+ will change resource scheduling; this may be a better place to do development with a plan to deploy in 2012. Oh, and get on the mapreduce lists, esp, the -dev list, to discuss issues The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. I have no idea what that means but am not convinced that reading an email forces me to comply with a different country's rules
Re: Environment consideration for a research on scheduling
I agree, we will go the standard route. Like you suggested we will go step by step to the full cluster deployment. After the first node configuration we will use clonezilla to replicate it and then setup them one by one.. On the workernodes I was thinking to run ubuntu server, namenode will run ubuntu desktop. I am interested how should I configure the environment that I will able to remotely monitor, analyse and configure the cluster. I will run jobs outsite the local network via ssh to the namenode, however in this situation I will not be abble to access the web interface of the job and tasktracker. So I am wondering how to analyze them and how did you configure your environment to be as practical as possible. For monitoring the cluster I saw that ganglia is one of the option, but in this stage of testing probably job-history files will be enough.. On 23 September 2011 17:09, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: If you are starting from scratch with no prior Hadoop install experience I would configure stand-alone, migrate to pseudo distributed and then to fully distributed verifying functionality at each step by doing a simple word count run. Also, if you don't mind using the CDH distribution then SCM / their rpms will greatly simplify both the bin installs as well as the user creation. Your VM route will most likely work but I can imagine the amount of hiccups during migration from that to the real cluster will not make it worth your time. Matt -Original Message- From: Merto Mertek [mailto:masmer...@gmail.com] Sent: Friday, September 23, 2011 10:00 AM To: common-user@hadoop.apache.org Subject: Environment consideration for a research on scheduling Hi, in the first phase we are planning to establish a small cluster with few commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server 10.10 and a hadoop build from the branch 0.20.204 (i had some issues with version 0.20.203 with missing libraries http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567 ). Would you suggest any other version? In the second phase we are planning to analyse, test and modify some of hadoop schedulers. Now I am interested what is the best way to deploy ubuntu and hadop to this few machine. I was thinking to configure the system in the local VM and then converting it to each physical machine but probably this is not the best option. If you know any other please share.. Thanks you! This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
Unsubscribe from jira issues
Hi, i am receiving messages from two mailing lists (common-dev,common-user) and I would like to disable receiving msg from jira. I am not a member of common-issues-unsubscribe list. Can I anyhow disable this? Thank you
Re: Unsubscribe from jira issues
Probably there is not any option just to disable jira issues.. I will probably need the common-dev list so I will stay subscribed.. Thank you... On 23 September 2011 16:11, Harsh J ha...@cloudera.com wrote: Merto, You need common-dev-unsubscribe@ The common-dev list receives just JIRA opened/resolved/reopened messages. The common-issues receives everything. On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek masmer...@gmail.com wrote: Hi, i am receiving messages from two mailing lists (common-dev,common-user) and I would like to disable receiving msg from jira. I am not a member of common-issues-unsubscribe list. Can I anyhow disable this? Thank you -- Harsh J
Re: Unsubscribe from jira issues
hehe :) you are right :) On 23 September 2011 16:21, Harsh J ha...@cloudera.com wrote: Merto, Am sure your mail client has some form of filtering available in that case! :-) On Fri, Sep 23, 2011 at 7:49 PM, Merto Mertek masmer...@gmail.com wrote: Probably there is not any option just to disable jira issues.. I will probably need the common-dev list so I will stay subscribed.. Thank you... On 23 September 2011 16:11, Harsh J ha...@cloudera.com wrote: Merto, You need common-dev-unsubscribe@ The common-dev list receives just JIRA opened/resolved/reopened messages. The common-issues receives everything. On Fri, Sep 23, 2011 at 7:27 PM, Merto Mertek masmer...@gmail.com wrote: Hi, i am receiving messages from two mailing lists (common-dev,common-user) and I would like to disable receiving msg from jira. I am not a member of common-issues-unsubscribe list. Can I anyhow disable this? Thank you -- Harsh J -- Harsh J
Environment consideration for a research on scheduling
Hi, in the first phase we are planning to establish a small cluster with few commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server 10.10 and a hadoop build from the branch 0.20.204 (i had some issues with version 0.20.203 with missing librarieshttp://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567). Would you suggest any other version? In the second phase we are planning to analyse, test and modify some of hadoop schedulers. Now I am interested what is the best way to deploy ubuntu and hadop to this few machine. I was thinking to configure the system in the local VM and then converting it to each physical machine but probably this is not the best option. If you know any other please share.. Thanks you!