It's very likely to be because the AMI has an older version of Mesos. We should make a new AMI.
The -d git option in the script seems to be broken too, so we should fix that. In theory it would work… I think it broke when we switched the location of the repo (and maybe the internal structure too). Matei On Jan 27, 2012, at 9:36 PM, Matthew Rathbone wrote: > When I spin up mesos using the ec2 scripts, and redeploy both hdfs and hadoop > using cloudera's distribution I see this error when I try to start the > jobtracker: > > 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the includes file to > 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the excludes file to > 12/01/28 05:23:28 INFO util.HostsFileReader: Refreshing hosts > (include/exclude) list > 12/01/28 05:23:28 INFO mapred.JobTracker: Decommissioning 0 nodes > 12/01/28 05:23:28 INFO mapred.FrameworkScheduler: Got resource offer value: > "201201280508-0-5" > > Exception in thread "Thread-20" java.lang.NoSuchMethodError: > org.apache.mesos.Protos$Resource.getScalar()Lorg/apache/mesos/Protos$Value$Scalar; > at > org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:176) > at > org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:183) > at > org.apache.hadoop.mapred.FrameworkScheduler.resourceOffers(FrameworkScheduler.java:203) > > > It seems to be stopping the job tracker from starting new tasks. > > I was wondering if this is a version conflict between the mesos I've built > against (trunk), and the version of mesos used on the AMI? -- it seems to > come from the generated protobuf library. > > > > To try and solve this, I attempted to spin up a cluster passing -d git (to > have the latest code pulled from git, but then I get a string of crazy python > exceptions: > > sync error: unexplained error (code 255) at > /SourceCache/rsync/rsync-40/rsync/io.c(452) [sender=2.6.9] > Traceback (most recent call last): > File "./mesos_ec2.py", line 541, in <module> > main() > File "./mesos_ec2.py", line 450, in main > setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, True) > File "./mesos_ec2.py", line 304, in setup_cluster > deploy_files(conn, "deploy." + opts.os, opts, master_nodes, slave_nodes, > zoo_nodes) > File "./mesos_ec2.py", line 415, in deploy_files > subprocess.check_call(command, shell=True) > File > "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", > line 462, in check_call > raise CalledProcessError(retcode, cmd) > subprocess.CalledProcessError: Command 'rsync -rv -e 'ssh -o > StrictHostKeyChecking=no -i /Users/matthew/id-foursquare' > '/var/folders/CK/CKzwG+5sFuSjDMUTvdmWfk+++TI/-Tmp-/tmpFmfdmB/' > '[email protected]:/'' returned non-zero exit status 255 > > > > > Are version conflicts the likely reason for this failure do you think? > > -- > Matthew Rathbone > Foursquare | Software Engineer | Server Engineering Team > [email protected] (mailto:[email protected]) | @rathboma > (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma) > >
