Re: Mesos Modifying User Group

2015-08-13 Thread Steven Schlansker

On Aug 12, 2015, at 3:28 PM, Nastooh Avessta (navesta) nave...@cisco.com 
wrote:

 Having a bit of a strange problem with Mesos 0.22, running Spark 1.4.0, on 
 Docker 1.6 slaves. Part of my Spark program calls on a script that accesses a 
 GPU. I am able to run this script:
 1.   As Bash
 2.   Via Marathon
 3.   As part of a Spark program running as a standalone master
 However, when I try to run the same Spark program with Mesos as master, i.e., 
 spark-submit --master mesos://\`cat /etc/mesos/zk\` --deploy-mode client…, I 
 am not able to access dri devices, e.g., mfx init: /dev/dri/renderD128 fd 
 open failed. What seems to be happening is that the group membership of the 
 default user, in this case “ubuntu” is modified by Mesos, i.e., whereas under 
 cases 1-3, above, I get:
  
 $ id
 uid=1000(ubuntu) gid=1000(ubuntu) 
 groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),102(netdev),999(docker)
 In case of Mesos, I get:
 uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),0(root)
  
 I am wondering if there are configuration parameters that can be passed to 
 Mesos to prevent it from modifying user groups?

Assuming your diagnosis here is correct,  this is actually a serious security 
issue -- notice how the group 0(root) was added!



Re: Mesos Whitelist syntax

2015-08-13 Thread haosdent
Hi, @Jeremy If the whitelist file, you need add every explicit IP as per
line. If you don't special --whitelist or use --whitelist=*, it would
accept all ip.

On Thu, Aug 13, 2015 at 6:49 AM, Jeremy Olexa jol...@spscommerce.com
wrote:

 Hello all,


 I've started up my mesos cluster with --whitelist=/tmp/mesos-whitelist.txt


 My question, is there a special syntax to achieve the default behavior of
 all offers accepted ? It seems that an empty file doesn't work nor does *
 - only the explicit IP (one per line). I can't find this syntax in the docs
 (completely willing to submit a PR, by the way). Just thought I would ask
 here before implementing some crazy script solution.


 I0812 11:01:22.890257 10920 hierarchical.hpp:635] Updated slave whitelist:
 {  }

 W0812 11:01:22.890270 10920 hierarchical.hpp:638] Whitelist is empty, no
 offers will be made!

 I0812 11:05:37.927624 10918 hierarchical.hpp:635] Updated slave whitelist:
 { * }

 I0812 11:13:42.996763 10920 hierarchical.hpp:635] Updated slave whitelist:
 { 10.66.69.19 }


 Thanks much,

 Jeremy




-- 
Best Regards,
Haosdent Huang


Re: Mesos Modifying User Group

2015-08-13 Thread John Omernik
I ran into this same issue.  For me it manifested as weird permission
denied in MapR's NFS implementation, running in bash, etc was fine. But
running in on Mesos, it didn't work (permission denied)(Also thank you to
MapR for helping me troubleshoot).  Good news, there is a patch.

https://issues.apache.org/jira/browse/MESOS-719

And it's fixed in Mesos 0.23.  I applied the patch and recompiled and it
worked great, and when I installed 0.23, it also worked great.

Good luck.

John

On Wed, Aug 12, 2015 at 5:28 PM, Nastooh Avessta (navesta) 
nave...@cisco.com wrote:

 Having a bit of a strange problem with Mesos 0.22, running Spark 1.4.0, on
 Docker 1.6 slaves. Part of my Spark program calls on a script that accesses
 a GPU. I am able to run this script:

 1.   As Bash

 2.   Via Marathon

 3.   As part of a Spark program running as a standalone master

 However, when I try to run the same Spark program with Mesos as master,
 i.e., spark-submit --master mesos://\`cat /etc/mesos/zk\` --deploy-mode
 client…, I am not able to access dri devices, e.g., mfx init:
 /dev/dri/renderD128 fd open failed. What seems to be happening is that the
 group membership of the default user, in this case “ubuntu” is modified by
 Mesos, i.e., whereas under cases 1-3, above, I get:



 $ id

 uid=1000(ubuntu) gid=1000(ubuntu)
 groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),102(netdev),999(docker)

 In case of Mesos, I get:

 uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),0(root)



 I am wondering if there are configuration parameters that can be passed to
 Mesos to prevent it from modifying user groups?



 Cheers,

 [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

 *Nastooh Avessta*
 ENGINEER.SOFTWARE ENGINEERING
 nave...@cisco.com
 Phone: *+1 604 647 1527 %2B1%20604%20647%201527*

 *Cisco Systems Limited*
 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
 VANCOUVER
 BRITISH COLUMBIA
 V7X 1J1
 CA
 Cisco.com http://www.cisco.com/



 [image: Think before you print.]Think before you print.

 This email may contain confidential and privileged material for the sole
 use of the intended recipient. Any review, use, distribution or disclosure
 by others is strictly prohibited. If you are not the intended recipient (or
 authorized to receive for the recipient), please contact the sender by
 reply email and delete all copies of this message.

 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/index.html

 Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J
 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences
 http://www.cisco.com/offer/subscribe/?sid=000478326 - Unsubscribe
 http://www.cisco.com/offer/unsubscribe/?sid=000478327 – Privacy
 http://www.cisco.com/web/siteassets/legal/privacy.html*





Launching tasks with reserved resources

2015-08-13 Thread Gidon Gershinsky
I have a simple setup where a framework runs with a role, and some 
resources are reserved in cluster for that role.
The resource offers arrive at the framework as a list of two resource 
sets: one general (cpus(*)), etc)  and one specific for the role 
(cpus(role1), etc).

So far so good. If two tasks are launched, each with one of the two 
resources, things work.

But problems start when I need to launch multiple smaller tasks (with a 
total resource consumption equal to the offered). I run this by creating 
resource objects, and attaching them to tasks, using calls from the 
standard Mesos samples (python):
task = mesos_pb2.TaskInfo()
cpus = task.resources.add()
cpus.name = cpus
cpus.scalar.value = TASK_CPUS

checking that total doesnt surpass the offered resources. This starts 
fine, but soon I get TASK_ERROR messages, due to Master validator finding 
that more resources are requested by tasks than available in the offer. 
This obviously happens because all tasks resources, as defined above, come 
with (*) role, while the offer resources are split between * and role1 
! Ok, then I assign a role to task resources, by adding
   cpus.role = role1

But this fails again, and for the same reason.. 

Shouldn't this work differently? When a resource offer is received 
framework with a role1, why should it care which part is 'unreserved' 
and which part is reserved to role1? When a task launch request is 
received by the master, from a framework with a role, why can't it check 
only the total resource amount, instead of treating unreserved and 
reserved resources separately? They are reserved for this role anyway.. Or 
I'm missing something?


Regards, 
Gidon





Re: Can't start master properly (stale state issue?); help!

2015-08-13 Thread haosdent
Hello, how you start the master? And could you try use netstat -antp|grep
5050 to find whether there are multi master processes run at a same
machine or not?

On Thu, Aug 13, 2015 at 10:37 PM, Paul Bell arach...@gmail.com wrote:

 Hi All,

 I hope someone can shed some light on this because I'm getting desperate!

 I try to start components zk, mesos-master, and marathon in that order.
 They are started via a program that SSHs to the sole host and does service
 xxx start. Everyone starts happily enough. But the Mesos UI shows me:

 *This master is not the leader, redirecting in 0 seconds ... go now*

 The pattern seen in all of the mesos-master.INFO logs (one of which shown
 below) is that the mesos-master with the correct IP@ starts. But then a
 new leader is detected and becomes leading master. This new leader shows
 UPID *(UPID=master@127.0.1.1:5050 http://master@127.0.1.1:5050*

 I've tried clearing what ZK and mesos-master state I can find, but this
 problem will not go away.

 Would someone be so kind as to a) explain what is happening here and b)
 suggest remedies?

 Thanks very much.

 -Paul


 Log file created at: 2015/08/13 10:19:43
 Running on machine: 71.100.14.9
 Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
 I0813 10:19:43.225636  2542 logging.cpp:172] INFO level logging started!
 I0813 10:19:43.235213  2542 main.cpp:181] Build: 2015-05-05 06:15:50 by
 root
 I0813 10:19:43.235244  2542 main.cpp:183] Version: 0.22.1
 I0813 10:19:43.235257  2542 main.cpp:186] Git tag: 0.22.1
 I0813 10:19:43.235268  2542 main.cpp:190] Git SHA:
 d6309f92a7f9af3ab61a878403e3d9c284ea87e0
 I0813 10:19:43.245098  2542 leveldb.cpp:176] Opened db in 9.386828ms
 I0813 10:19:43.247138  2542 leveldb.cpp:183] Compacted db in 1.956669ms
 I0813 10:19:43.247194  2542 leveldb.cpp:198] Created db iterator in 13961ns
 I0813 10:19:43.247206  2542 leveldb.cpp:204] Seeked to beginning of db in
 677ns
 I0813 10:19:43.247215  2542 leveldb.cpp:273] Iterated through 0 keys in
 the db in 243ns
 I0813 10:19:43.247252  2542 replica.cpp:744] Replica recovered with log
 positions 0 - 0 with 1 holes and 0 unlearned
 I0813 10:19:43.248755  2611 log.cpp:238] Attempting to join replica to
 ZooKeeper group
 I0813 10:19:43.248924  2542 main.cpp:306] Starting Mesos master
 I0813 10:19:43.249244  2612 recover.cpp:449] Starting replica recovery
 I0813 10:19:43.250239  2612 recover.cpp:475] Replica is in EMPTY status
 I0813 10:19:43.250819  2612 replica.cpp:641] Replica in EMPTY status
 received a broadcasted recover request
 I0813 10:19:43.251014  2607 recover.cpp:195] Received a recover response
 from a replica in EMPTY status
 *I0813 10:19:43.249503  2542 master.cpp:349] Master
 20150813-101943-151938119-5050-2542 (71.100.14.9) started on
 71.100.14.9:5050 http://71.100.14.9:5050*
 I0813 10:19:43.252053  2610 recover.cpp:566] Updating replica status to
 STARTING
 I0813 10:19:43.252571  2542 master.cpp:397] Master allowing
 unauthenticated frameworks to register
 I0813 10:19:43.253159  2542 master.cpp:402] Master allowing
 unauthenticated slaves to register
 I0813 10:19:43.254276  2612 leveldb.cpp:306] Persisting metadata (8 bytes)
 to leveldb took 1.816161ms
 I0813 10:19:43.254323  2612 replica.cpp:323] Persisted replica status to
 STARTING
 I0813 10:19:43.254905  2612 recover.cpp:475] Replica is in STARTING status
 I0813 10:19:43.255203  2612 replica.cpp:641] Replica in STARTING status
 received a broadcasted recover request
 I0813 10:19:43.255265  2612 recover.cpp:195] Received a recover response
 from a replica in STARTING status
 I0813 10:19:43.255343  2612 recover.cpp:566] Updating replica status to
 VOTING
 I0813 10:19:43.258730  2611 master.cpp:1295] Successfully attached file
 '/var/log/mesos/mesos-master.INFO'
 I0813 10:19:43.258760  2609 contender.cpp:131] Joining the ZK group
 I0813 10:19:43.258862  2612 leveldb.cpp:306] Persisting metadata (8 bytes)
 to leveldb took 3.477458ms
 I0813 10:19:43.258894  2612 replica.cpp:323] Persisted replica status to
 VOTING
 I0813 10:19:43.258934  2612 recover.cpp:580] Successfully joined the Paxos
 group
 I0813 10:19:43.258987  2612 recover.cpp:464] Recover process terminated
 I0813 10:19:46.590340  2606 group.cpp:313] Group process (group(1)@
 71.100.14.9:5050) connected to ZooKeeper
 I0813 10:19:46.590373  2606 group.cpp:790] Syncing group operations: queue
 size (joins, cancels, datas) = (0, 0, 0)
 I0813 10:19:46.590386  2606 group.cpp:385] Trying to create path
 '/mesos/log_replicas' in ZooKeeper
 I0813 10:19:46.591442  2606 network.hpp:424] ZooKeeper group memberships
 changed
 I0813 10:19:46.591514  2606 group.cpp:659] Trying to get
 '/mesos/log_replicas/00' in ZooKeeper
 I0813 10:19:46.592146  2606 group.cpp:659] Trying to get
 '/mesos/log_replicas/01' in ZooKeeper
 I0813 10:19:46.593128  2608 network.hpp:466] ZooKeeper group PIDs: {
 log-replica(1)@127.0.1.1:5050 }
 I0813 10:19:46.593955  2608 group.cpp:313] Group process (group(2)@
 71.100.14.9:5050

RE: Mesos Modifying User Group

2015-08-13 Thread Nastooh Avessta (navesta)
0.23, here I come.
Thanks John, will install 0.23 and retest.
Cheers,

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
nave...@cisco.com
Phone: +1 604 647 1527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.comhttp://www.cisco.com/





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. 
Phone: 416-306-7000; Fax: 416-306-7099. 
Preferenceshttp://www.cisco.com/offer/subscribe/?sid=000478326 - 
Unsubscribehttp://www.cisco.com/offer/unsubscribe/?sid=000478327 – 
Privacyhttp://www.cisco.com/web/siteassets/legal/privacy.html

From: John Omernik [mailto:j...@omernik.com]
Sent: Thursday, August 13, 2015 5:02 AM
To: user@mesos.apache.org
Subject: Re: Mesos Modifying User Group

I ran into this same issue.  For me it manifested as weird permission denied in 
MapR's NFS implementation, running in bash, etc was fine. But running in on 
Mesos, it didn't work (permission denied)(Also thank you to MapR for helping me 
troubleshoot).  Good news, there is a patch.

https://issues.apache.org/jira/browse/MESOS-719

And it's fixed in Mesos 0.23.  I applied the patch and recompiled and it worked 
great, and when I installed 0.23, it also worked great.

Good luck.

John

On Wed, Aug 12, 2015 at 5:28 PM, Nastooh Avessta (navesta) 
nave...@cisco.commailto:nave...@cisco.com wrote:
Having a bit of a strange problem with Mesos 0.22, running Spark 1.4.0, on 
Docker 1.6 slaves. Part of my Spark program calls on a script that accesses a 
GPU. I am able to run this script:

1.   As Bash

2.   Via Marathon

3.   As part of a Spark program running as a standalone master
However, when I try to run the same Spark program with Mesos as master, i.e., 
spark-submit --master mesos://\`cat /etc/mesos/zk\` --deploy-mode client…, I am 
not able to access dri devices, e.g., mfx init: /dev/dri/renderD128 fd open 
failed. What seems to be happening is that the group membership of the default 
user, in this case “ubuntu” is modified by Mesos, i.e., whereas under cases 
1-3, above, I get:

$ id
uid=1000(ubuntu) gid=1000(ubuntu) 
groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),102(netdev),999(docker)
In case of Mesos, I get:
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),0(root)

I am wondering if there are configuration parameters that can be passed to 
Mesos to prevent it from modifying user groups?

Cheers,
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
nave...@cisco.commailto:nave...@cisco.com
Phone: +1 604 647 1527tel:%2B1%20604%20647%201527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.comhttp://www.cisco.com/





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. 
Phone: 416-306-7000tel:416-306-7000; Fax: 416-306-7099tel:416-306-7099. 
Preferenceshttp://www.cisco.com/offer/subscribe/?sid=000478326 - 
Unsubscribehttp://www.cisco.com/offer/unsubscribe/?sid=000478327 – 
Privacyhttp://www.cisco.com/web/siteassets/legal/privacy.html




Re: Can't start master properly (stale state issue?); help!

2015-08-13 Thread Marco Massenzio
To be really sure about the possible root cause, I'd need to know how you
installed Mesos on your server, if it's via Mesosphere packages, the
configuration is described here:
https://open.mesosphere.com/reference/packages/

I am almost[0] sure the behavior you are seeing has something to do how the
server resolves the hostname to an IP for your Master - unless you give an
explicit IP address to bind to (--ip) libprocess will look up the hostname,
reverse-DNS it, and resolve to an IP address: if that fails, it falls back
to localhost.

If you want to try a quick hack, you can run `cat /etc/hostname` on that
server, and add a line in /etc/hosts that resolves that name to the actual
IP address (71.100.14.9, in your logs).

The other possibility is that it's really a 'stale state' in ZK - you can
either drop the znode (whichever you used for the --zk path) or launch with
a different one.

Finally, if you have the option to run master without using the `service
start`, by SSH'ing into the server and doing something like:

/path/to/install/bin/mesos-master.sh --quorum=1 --work_dir=/tmp/mesos
--zk=zk://ZK-IP:ZK-PORT/mesos/test --ip=71.100.14.9

and see whether that works.

If none of the above helps, please let us know what you see and we'll keep
debugging it :)

BTW - the new leading master is a bit of a logging decoy, it's not
actually new per se - so I'm almost[0] sure the leader never changed.

[0] almost as this line confuses me:
I0813 10:19:46.601297  2612 network.hpp:466] ZooKeeper group PIDs: {
log-replica(1)@127.0.1.1:5050, log-replica(1)@71.100.14.9:5050 }
(but that's because of my lack of deep understanding of how the
log-replicas work)

*Marco Massenzio*
*Distributed Systems Engineer*

On Thu, Aug 13, 2015 at 7:37 AM, Paul Bell arach...@gmail.com wrote:

 Hi All,

 I hope someone can shed some light on this because I'm getting desperate!

 I try to start components zk, mesos-master, and marathon in that order.
 They are started via a program that SSHs to the sole host and does service
 xxx start. Everyone starts happily enough. But the Mesos UI shows me:

 *This master is not the leader, redirecting in 0 seconds ... go now*

 The pattern seen in all of the mesos-master.INFO logs (one of which shown
 below) is that the mesos-master with the correct IP@ starts. But then a
 new leader is detected and becomes leading master. This new leader shows
 UPID *(UPID=master@127.0.1.1:5050 http://master@127.0.1.1:5050*

 I've tried clearing what ZK and mesos-master state I can find, but this
 problem will not go away.

 Would someone be so kind as to a) explain what is happening here and b)
 suggest remedies?

 Thanks very much.

 -Paul


 Log file created at: 2015/08/13 10:19:43
 Running on machine: 71.100.14.9
 Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
 I0813 10:19:43.225636  2542 logging.cpp:172] INFO level logging started!
 I0813 10:19:43.235213  2542 main.cpp:181] Build: 2015-05-05 06:15:50 by
 root
 I0813 10:19:43.235244  2542 main.cpp:183] Version: 0.22.1
 I0813 10:19:43.235257  2542 main.cpp:186] Git tag: 0.22.1
 I0813 10:19:43.235268  2542 main.cpp:190] Git SHA:
 d6309f92a7f9af3ab61a878403e3d9c284ea87e0
 I0813 10:19:43.245098  2542 leveldb.cpp:176] Opened db in 9.386828ms
 I0813 10:19:43.247138  2542 leveldb.cpp:183] Compacted db in 1.956669ms
 I0813 10:19:43.247194  2542 leveldb.cpp:198] Created db iterator in 13961ns
 I0813 10:19:43.247206  2542 leveldb.cpp:204] Seeked to beginning of db in
 677ns
 I0813 10:19:43.247215  2542 leveldb.cpp:273] Iterated through 0 keys in
 the db in 243ns
 I0813 10:19:43.247252  2542 replica.cpp:744] Replica recovered with log
 positions 0 - 0 with 1 holes and 0 unlearned
 I0813 10:19:43.248755  2611 log.cpp:238] Attempting to join replica to
 ZooKeeper group
 I0813 10:19:43.248924  2542 main.cpp:306] Starting Mesos master
 I0813 10:19:43.249244  2612 recover.cpp:449] Starting replica recovery
 I0813 10:19:43.250239  2612 recover.cpp:475] Replica is in EMPTY status
 I0813 10:19:43.250819  2612 replica.cpp:641] Replica in EMPTY status
 received a broadcasted recover request
 I0813 10:19:43.251014  2607 recover.cpp:195] Received a recover response
 from a replica in EMPTY status
 *I0813 10:19:43.249503  2542 master.cpp:349] Master
 20150813-101943-151938119-5050-2542 (71.100.14.9) started on
 71.100.14.9:5050 http://71.100.14.9:5050*
 I0813 10:19:43.252053  2610 recover.cpp:566] Updating replica status to
 STARTING
 I0813 10:19:43.252571  2542 master.cpp:397] Master allowing
 unauthenticated frameworks to register
 I0813 10:19:43.253159  2542 master.cpp:402] Master allowing
 unauthenticated slaves to register
 I0813 10:19:43.254276  2612 leveldb.cpp:306] Persisting metadata (8 bytes)
 to leveldb took 1.816161ms
 I0813 10:19:43.254323  2612 replica.cpp:323] Persisted replica status to
 STARTING
 I0813 10:19:43.254905  2612 recover.cpp:475] Replica is in STARTING status
 I0813 10:19:43.255203  2612 replica.cpp:641] Replica in STARTING status

Can't start master properly (stale state issue?); help!

2015-08-13 Thread Paul Bell
Hi All,

I hope someone can shed some light on this because I'm getting desperate!

I try to start components zk, mesos-master, and marathon in that order.
They are started via a program that SSHs to the sole host and does service
xxx start. Everyone starts happily enough. But the Mesos UI shows me:

*This master is not the leader, redirecting in 0 seconds ... go now*

The pattern seen in all of the mesos-master.INFO logs (one of which shown
below) is that the mesos-master with the correct IP@ starts. But then a
new leader is detected and becomes leading master. This new leader shows
UPID *(UPID=master@127.0.1.1:5050 http://master@127.0.1.1:5050*

I've tried clearing what ZK and mesos-master state I can find, but this
problem will not go away.

Would someone be so kind as to a) explain what is happening here and b)
suggest remedies?

Thanks very much.

-Paul


Log file created at: 2015/08/13 10:19:43
Running on machine: 71.100.14.9
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
I0813 10:19:43.225636  2542 logging.cpp:172] INFO level logging started!
I0813 10:19:43.235213  2542 main.cpp:181] Build: 2015-05-05 06:15:50 by root
I0813 10:19:43.235244  2542 main.cpp:183] Version: 0.22.1
I0813 10:19:43.235257  2542 main.cpp:186] Git tag: 0.22.1
I0813 10:19:43.235268  2542 main.cpp:190] Git SHA:
d6309f92a7f9af3ab61a878403e3d9c284ea87e0
I0813 10:19:43.245098  2542 leveldb.cpp:176] Opened db in 9.386828ms
I0813 10:19:43.247138  2542 leveldb.cpp:183] Compacted db in 1.956669ms
I0813 10:19:43.247194  2542 leveldb.cpp:198] Created db iterator in 13961ns
I0813 10:19:43.247206  2542 leveldb.cpp:204] Seeked to beginning of db in
677ns
I0813 10:19:43.247215  2542 leveldb.cpp:273] Iterated through 0 keys in the
db in 243ns
I0813 10:19:43.247252  2542 replica.cpp:744] Replica recovered with log
positions 0 - 0 with 1 holes and 0 unlearned
I0813 10:19:43.248755  2611 log.cpp:238] Attempting to join replica to
ZooKeeper group
I0813 10:19:43.248924  2542 main.cpp:306] Starting Mesos master
I0813 10:19:43.249244  2612 recover.cpp:449] Starting replica recovery
I0813 10:19:43.250239  2612 recover.cpp:475] Replica is in EMPTY status
I0813 10:19:43.250819  2612 replica.cpp:641] Replica in EMPTY status
received a broadcasted recover request
I0813 10:19:43.251014  2607 recover.cpp:195] Received a recover response
from a replica in EMPTY status
*I0813 10:19:43.249503  2542 master.cpp:349] Master
20150813-101943-151938119-5050-2542 (71.100.14.9) started on
71.100.14.9:5050 http://71.100.14.9:5050*
I0813 10:19:43.252053  2610 recover.cpp:566] Updating replica status to
STARTING
I0813 10:19:43.252571  2542 master.cpp:397] Master allowing unauthenticated
frameworks to register
I0813 10:19:43.253159  2542 master.cpp:402] Master allowing unauthenticated
slaves to register
I0813 10:19:43.254276  2612 leveldb.cpp:306] Persisting metadata (8 bytes)
to leveldb took 1.816161ms
I0813 10:19:43.254323  2612 replica.cpp:323] Persisted replica status to
STARTING
I0813 10:19:43.254905  2612 recover.cpp:475] Replica is in STARTING status
I0813 10:19:43.255203  2612 replica.cpp:641] Replica in STARTING status
received a broadcasted recover request
I0813 10:19:43.255265  2612 recover.cpp:195] Received a recover response
from a replica in STARTING status
I0813 10:19:43.255343  2612 recover.cpp:566] Updating replica status to
VOTING
I0813 10:19:43.258730  2611 master.cpp:1295] Successfully attached file
'/var/log/mesos/mesos-master.INFO'
I0813 10:19:43.258760  2609 contender.cpp:131] Joining the ZK group
I0813 10:19:43.258862  2612 leveldb.cpp:306] Persisting metadata (8 bytes)
to leveldb took 3.477458ms
I0813 10:19:43.258894  2612 replica.cpp:323] Persisted replica status to
VOTING
I0813 10:19:43.258934  2612 recover.cpp:580] Successfully joined the Paxos
group
I0813 10:19:43.258987  2612 recover.cpp:464] Recover process terminated
I0813 10:19:46.590340  2606 group.cpp:313] Group process (group(1)@
71.100.14.9:5050) connected to ZooKeeper
I0813 10:19:46.590373  2606 group.cpp:790] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I0813 10:19:46.590386  2606 group.cpp:385] Trying to create path
'/mesos/log_replicas' in ZooKeeper
I0813 10:19:46.591442  2606 network.hpp:424] ZooKeeper group memberships
changed
I0813 10:19:46.591514  2606 group.cpp:659] Trying to get
'/mesos/log_replicas/00' in ZooKeeper
I0813 10:19:46.592146  2606 group.cpp:659] Trying to get
'/mesos/log_replicas/01' in ZooKeeper
I0813 10:19:46.593128  2608 network.hpp:466] ZooKeeper group PIDs: {
log-replica(1)@127.0.1.1:5050 }
I0813 10:19:46.593955  2608 group.cpp:313] Group process (group(2)@
71.100.14.9:5050) connected to ZooKeeper
I0813 10:19:46.593977  2608 group.cpp:790] Syncing group operations: queue
size (joins, cancels, datas) = (1, 0, 0)
I0813 10:19:46.593986  2608 group.cpp:385] Trying to create path
'/mesos/log_replicas' in ZooKeeper
I0813 10:19:46.594894  2605 group.cpp:313] Group process (group(3)@
71.100.14.9:5050

Re: Mesos Whitelist syntax

2015-08-13 Thread Christos Kozyrakis
hit state.json in the master for info on all slaves and all tasks. 

 On Aug 13, 2015, at 5:04 PM, Jeremy Olexa jol...@spscommerce.com wrote:
 
 Ok, thanks for the feedback. Does anyone know of an intelligent way to gather 
 all registered slave IP addresses? The method should be datacenter 
 independent or not tied to a particular public cloud api, for example.
 
 
 From: haosdent haosd...@gmail.com mailto:haosd...@gmail.com
 Sent: Thursday, August 13, 2015 1:56 AM
 To: user@mesos.apache.org mailto:user@mesos.apache.org
 Subject: Re: Mesos Whitelist syntax
  
 Hi, @Jeremy If the whitelist file, you need add every explicit IP as per 
 line. If you don't special --whitelist or use --whitelist=*, it would 
 accept all ip.
 
 On Thu, Aug 13, 2015 at 6:49 AM, Jeremy Olexa jol...@spscommerce.com 
 mailto:jol...@spscommerce.com wrote:
 Hello all,
 
 I've started up my mesos cluster with --whitelist=/tmp/mesos-whitelist.txt
 
 My question, is there a special syntax to achieve the default behavior of 
 all offers accepted ? It seems that an empty file doesn't work nor does * - 
 only the explicit IP (one per line). I can't find this syntax in the docs 
 (completely willing to submit a PR, by the way). Just thought I would ask 
 here before implementing some crazy script solution.
 
 I0812 11:01:22.890257 10920 hierarchical.hpp:635] Updated slave whitelist: {  
 }
 W0812 11:01:22.890270 10920 hierarchical.hpp:638] Whitelist is empty, no 
 offers will be made!
 I0812 11:05:37.927624 10918 hierarchical.hpp:635] Updated slave whitelist: { 
 * }
 I0812 11:13:42.996763 10920 hierarchical.hpp:635] Updated slave whitelist: { 
 10.66.69.19 }
 
 Thanks much,
 Jeremy
 
 
 
 -- 
 Best Regards,
 Haosdent Huang



Re: Mesos Whitelist syntax

2015-08-13 Thread Jeremy Olexa
Ok, thanks for the feedback. Does anyone know of an intelligent way to gather 
all registered slave IP addresses? The method should be datacenter independent 
or not tied to a particular public cloud api, for example.



From: haosdent haosd...@gmail.com
Sent: Thursday, August 13, 2015 1:56 AM
To: user@mesos.apache.org
Subject: Re: Mesos Whitelist syntax

Hi, @Jeremy If the whitelist file, you need add every explicit IP as per line. 
If you don't special --whitelist or use --whitelist=*, it would accept all ip.

On Thu, Aug 13, 2015 at 6:49 AM, Jeremy Olexa 
jol...@spscommerce.commailto:jol...@spscommerce.com wrote:

Hello all,


I've started up my mesos cluster with --whitelist=/tmp/mesos-whitelist.txt


My question, is there a special syntax to achieve the default behavior of all 
offers accepted ? It seems that an empty file doesn't work nor does * - only 
the explicit IP (one per line). I can't find this syntax in the docs 
(completely willing to submit a PR, by the way). Just thought I would ask here 
before implementing some crazy script solution.


I0812 11:01:22.890257 10920 hierarchical.hpp:635] Updated slave whitelist: {  }

W0812 11:01:22.890270 10920 hierarchical.hpp:638] Whitelist is empty, no offers 
will be made!

I0812 11:05:37.927624 10918 hierarchical.hpp:635] Updated slave whitelist: { * }

I0812 11:13:42.996763 10920 hierarchical.hpp:635] Updated slave whitelist: { 
10.66.69.19 }


Thanks much,

Jeremy



--
Best Regards,
Haosdent Huang


Re: Can't start master properly (stale state issue?); help!

2015-08-13 Thread Paul Bell
Marco  hasodent,

This is just a quick note to say thank you for your replies.

I will answer you much more fully tomorrow, but for now can only manage a
few quick observations  questions:

1. Having some months ago encountered a known problem with the IP@
127.0.1.1 (I'll provide references tomorrow), I early on configured
/etc/hosts, replacing myHostName 127.0.1.1 with myHostName Real_IP.
That said, I can't rule out a race condition whereby ZK | mesos-master saw
the original unchanged /etc/hosts before I zapped it.

2. What is a znode and how would I drop it?

I start the services (zk, master, marathon; all on same host) by SSHing
into the host  doing service  start commands.

Again, thanks very much; and more tomorrow.

Cordially,

Paul

On Thu, Aug 13, 2015 at 1:08 PM, haosdent haosd...@gmail.com wrote:

 Hello, how you start the master? And could you try use netstat -antp|grep
 5050 to find whether there are multi master processes run at a same
 machine or not?

 On Thu, Aug 13, 2015 at 10:37 PM, Paul Bell arach...@gmail.com wrote:

 Hi All,

 I hope someone can shed some light on this because I'm getting desperate!

 I try to start components zk, mesos-master, and marathon in that order.
 They are started via a program that SSHs to the sole host and does service
 xxx start. Everyone starts happily enough. But the Mesos UI shows me:

 *This master is not the leader, redirecting in 0 seconds ... go now*

 The pattern seen in all of the mesos-master.INFO logs (one of which shown
 below) is that the mesos-master with the correct IP@ starts. But then a
 new leader is detected and becomes leading master. This new leader shows
 UPID *(UPID=master@127.0.1.1:5050 http://master@127.0.1.1:5050*

 I've tried clearing what ZK and mesos-master state I can find, but this
 problem will not go away.

 Would someone be so kind as to a) explain what is happening here and b)
 suggest remedies?

 Thanks very much.

 -Paul


 Log file created at: 2015/08/13 10:19:43
 Running on machine: 71.100.14.9
 Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
 I0813 10:19:43.225636  2542 logging.cpp:172] INFO level logging started!
 I0813 10:19:43.235213  2542 main.cpp:181] Build: 2015-05-05 06:15:50 by
 root
 I0813 10:19:43.235244  2542 main.cpp:183] Version: 0.22.1
 I0813 10:19:43.235257  2542 main.cpp:186] Git tag: 0.22.1
 I0813 10:19:43.235268  2542 main.cpp:190] Git SHA:
 d6309f92a7f9af3ab61a878403e3d9c284ea87e0
 I0813 10:19:43.245098  2542 leveldb.cpp:176] Opened db in 9.386828ms
 I0813 10:19:43.247138  2542 leveldb.cpp:183] Compacted db in 1.956669ms
 I0813 10:19:43.247194  2542 leveldb.cpp:198] Created db iterator in
 13961ns
 I0813 10:19:43.247206  2542 leveldb.cpp:204] Seeked to beginning of db in
 677ns
 I0813 10:19:43.247215  2542 leveldb.cpp:273] Iterated through 0 keys in
 the db in 243ns
 I0813 10:19:43.247252  2542 replica.cpp:744] Replica recovered with log
 positions 0 - 0 with 1 holes and 0 unlearned
 I0813 10:19:43.248755  2611 log.cpp:238] Attempting to join replica to
 ZooKeeper group
 I0813 10:19:43.248924  2542 main.cpp:306] Starting Mesos master
 I0813 10:19:43.249244  2612 recover.cpp:449] Starting replica recovery
 I0813 10:19:43.250239  2612 recover.cpp:475] Replica is in EMPTY status
 I0813 10:19:43.250819  2612 replica.cpp:641] Replica in EMPTY status
 received a broadcasted recover request
 I0813 10:19:43.251014  2607 recover.cpp:195] Received a recover response
 from a replica in EMPTY status
 *I0813 10:19:43.249503  2542 master.cpp:349] Master
 20150813-101943-151938119-5050-2542 (71.100.14.9) started on
 71.100.14.9:5050 http://71.100.14.9:5050*
 I0813 10:19:43.252053  2610 recover.cpp:566] Updating replica status to
 STARTING
 I0813 10:19:43.252571  2542 master.cpp:397] Master allowing
 unauthenticated frameworks to register
 I0813 10:19:43.253159  2542 master.cpp:402] Master allowing
 unauthenticated slaves to register
 I0813 10:19:43.254276  2612 leveldb.cpp:306] Persisting metadata (8
 bytes) to leveldb took 1.816161ms
 I0813 10:19:43.254323  2612 replica.cpp:323] Persisted replica status to
 STARTING
 I0813 10:19:43.254905  2612 recover.cpp:475] Replica is in STARTING status
 I0813 10:19:43.255203  2612 replica.cpp:641] Replica in STARTING status
 received a broadcasted recover request
 I0813 10:19:43.255265  2612 recover.cpp:195] Received a recover response
 from a replica in STARTING status
 I0813 10:19:43.255343  2612 recover.cpp:566] Updating replica status to
 VOTING
 I0813 10:19:43.258730  2611 master.cpp:1295] Successfully attached file
 '/var/log/mesos/mesos-master.INFO'
 I0813 10:19:43.258760  2609 contender.cpp:131] Joining the ZK group
 I0813 10:19:43.258862  2612 leveldb.cpp:306] Persisting metadata (8
 bytes) to leveldb took 3.477458ms
 I0813 10:19:43.258894  2612 replica.cpp:323] Persisted replica status to
 VOTING
 I0813 10:19:43.258934  2612 recover.cpp:580] Successfully joined the
 Paxos group
 I0813 10:19:43.258987  2612 recover.cpp:464] Recover process

Re: Can't start master properly (stale state issue?); help!

2015-08-13 Thread Marco Massenzio
On Thu, Aug 13, 2015 at 11:53 AM, Paul Bell arach...@gmail.com wrote:

 Marco  hasodent,

 This is just a quick note to say thank you for your replies.

 No problem, you're welcome.


 I will answer you much more fully tomorrow, but for now can only manage a
 few quick observations  questions:

 1. Having some months ago encountered a known problem with the IP@
 127.0.1.1 (I'll provide references tomorrow), I early on configured
 /etc/hosts, replacing myHostName 127.0.1.1 with myHostName Real_IP.
 That said, I can't rule out a race condition whereby ZK | mesos-master saw
 the original unchanged /etc/hosts before I zapped it.

 2. What is a znode and how would I drop it?

 so, the znode is the fancy name that ZK gives to the nodes in its tree
(trivially, the path) - assuming that you give Mesos the following ZK URL:
zk://10.10.0.5:2181/mesos/prod

the 'znode' would be `/mesos/prod` and you could go inspect it (using
zkCli.sh) by doing:
 ls /mesos/prod

you should see at least one (with the Master running) file: info_001 or
json.info_0001 (depending on whether you're running 0.23 or 0.24) and
you could then inspect its contents with:
 get /mesos/prod/info_001

For example, if I run a Mesos 0.23 on my localhost, against ZK on the same:

$ ./bin/mesos-master.sh --zk=zk://localhost:2181/mesos/test --quorum=1
--work_dir=/tmp/m23-2 --port=5053
I can connect to ZK via zkCli.sh and:

[zk: localhost:2181(CONNECTED) 4] ls /mesos/test
[info_06, log_replicas]
[zk: localhost:2181(CONNECTED) 6] get /mesos/test/info_06
#20150813-120952-18983104-5053-14072ц 'master@192.168.33.1:5053
* 192.168.33.120.23.0

cZxid = 0x314
dataLength = 93
 // a bunch of other metadata
numChildren = 0

(you can remove it with - you guessed it - `rm -f /mesos/test` at the CLI
prompt - stop Mesos first, or it will be a very unhappy Master :).
in the corresponding logs I see (note the new leader here too, even
though this was the one and only):

I0813 12:09:52.126509 105455616 group.cpp:656] Trying to get
'/mesos/test/info_06' in ZooKeeper
W0813 12:09:52.127071 107065344 detector.cpp:444] Leading master
master@192.168.33.1:5053 is using a Protobuf binary format when registering
with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see
MESOS-2340)
I0813 12:09:52.127094 107065344 detector.cpp:481] A new leading master
(UPID=master@192.168.33.1:5053) is detected
I0813 12:09:52.127187 103845888 master.cpp:1481] The newly elected leader
is master@192.168.33.1:5053 with id 20150813-120952-18983104-5053-14072
I0813 12:09:52.127209 103845888 master.cpp:1494] Elected as the leading
master!


At this point, I'm almost sure you're running up against some issue with
the log-replica; but I'm the least competent guy here to help you on that
one, hopefully someone else will be able to add insight here.

I start the services (zk, master, marathon; all on same host) by SSHing
 into the host  doing service  start commands.

 Again, thanks very much; and more tomorrow.

 Cordially,

 Paul

 On Thu, Aug 13, 2015 at 1:08 PM, haosdent haosd...@gmail.com wrote:

 Hello, how you start the master? And could you try use netstat
 -antp|grep 5050 to find whether there are multi master processes run at a
 same machine or not?

 On Thu, Aug 13, 2015 at 10:37 PM, Paul Bell arach...@gmail.com wrote:

 Hi All,

 I hope someone can shed some light on this because I'm getting desperate!

 I try to start components zk, mesos-master, and marathon in that order.
 They are started via a program that SSHs to the sole host and does service
 xxx start. Everyone starts happily enough. But the Mesos UI shows me:

 *This master is not the leader, redirecting in 0 seconds ... go now*

 The pattern seen in all of the mesos-master.INFO logs (one of which
 shown below) is that the mesos-master with the correct IP@ starts. But
 then a new leader is detected and becomes leading master. This new leader
 shows UPID *(UPID=master@127.0.1.1:5050 http://master@127.0.1.1:5050*

 I've tried clearing what ZK and mesos-master state I can find, but this
 problem will not go away.

 Would someone be so kind as to a) explain what is happening here and b)
 suggest remedies?

 Thanks very much.

 -Paul


 Log file created at: 2015/08/13 10:19:43
 Running on machine: 71.100.14.9
 Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
 I0813 10:19:43.225636  2542 logging.cpp:172] INFO level logging started!
 I0813 10:19:43.235213  2542 main.cpp:181] Build: 2015-05-05 06:15:50 by
 root
 I0813 10:19:43.235244  2542 main.cpp:183] Version: 0.22.1
 I0813 10:19:43.235257  2542 main.cpp:186] Git tag: 0.22.1
 I0813 10:19:43.235268  2542 main.cpp:190] Git SHA:
 d6309f92a7f9af3ab61a878403e3d9c284ea87e0
 I0813 10:19:43.245098  2542 leveldb.cpp:176] Opened db in 9.386828ms
 I0813 10:19:43.247138  2542 leveldb.cpp:183] Compacted db in 1.956669ms
 I0813 10:19:43.247194  2542 leveldb.cpp:198] Created db iterator in
 13961ns
 I0813 10:19:43.247206  2542

RE: Can't start master properly (stale state issue?); help!

2015-08-13 Thread Klaus Ma
I used to meet a similar issue with Zookeeper + Messo; I resolved it by remove 
127.0.1.1 from /etc/hosts; here is an example:
klaus@klaus-OptiPlex-780:~/Workspace/mesos$ cat /etc/hosts
127.0.0.1   localhost
127.0.1.1   klaus-OptiPlex-780   = remove this line, and a new line: 
mapping IP (e.g. 192.168.1.100) with hostname
...
BTW, please also clear-up the log directory and re-start ZK  Mesos.

If any more comments, please let me know.

Regards,Klaus Ma (马达), PMP® | http://www.cguru.net
CallSend SMSCall from mobileAdd to SkypeYou'll need Skype CreditFree via 
SkypeCallSend SMSCall from mobileAdd to SkypeYou'll need Skype CreditFree via 
SkypeDate: Thu, 13 Aug 2015 12:20:34 -0700
Subject: Re: Can't start master properly (stale state issue?); help!
From: ma...@mesosphere.io
To: user@mesos.apache.org


On Thu, Aug 13, 2015 at 11:53 AM, Paul Bell arach...@gmail.com wrote:
Marco  hasodent,
This is just a quick note to say thank you for your replies.
No problem, you're welcome. I will answer you much more fully tomorrow, but for 
now can only manage a few quick observations  questions:
1. Having some months ago encountered a known problem with the IP@ 127.0.1.1 
(I'll provide references tomorrow), I early on configured /etc/hosts, replacing 
myHostName 127.0.1.1 with myHostName Real_IP. That said, I can't rule out 
a race condition whereby ZK | mesos-master saw the original unchanged 
/etc/hosts before I zapped it.
2. What is a znode and how would I drop it?
so, the znode is the fancy name that ZK gives to the nodes in its tree 
(trivially, the path) - assuming that you give Mesos the following ZK 
URL:zk://10.10.0.5:2181/mesos/prod
the 'znode' would be `/mesos/prod` and you could go inspect it (using zkCli.sh) 
by doing: ls /mesos/prod
you should see at least one (with the Master running) file: info_001 or 
json.info_0001 (depending on whether you're running 0.23 or 0.24) and you 
could then inspect its contents with: get /mesos/prod/info_001
For example, if I run a Mesos 0.23 on my localhost, against ZK on the same:







$ ./bin/mesos-master.sh --zk=zk://localhost:2181/mesos/test --quorum=1 
--work_dir=/tmp/m23-2 --port=5053I can connect to ZK via zkCli.sh and:
[zk: localhost:2181(CONNECTED) 4] ls /mesos/test
[info_06, log_replicas]
[zk: localhost:2181(CONNECTED) 6] get /mesos/test/info_06
#20150813-120952-18983104-5053-14072ц 'master@192.168.33.1:5053* 
192.168.33.120.23.0

cZxid = 0x314
dataLength = 93
 // a bunch of other metadata
numChildren = 0
(you can remove it with - you guessed it - `rm -f /mesos/test` at the CLI 
prompt - stop Mesos first, or it will be a very unhappy Master :).
in the corresponding logs I see (note the new leader here too, even though 
this was the one and only):
I0813 12:09:52.126509 105455616 group.cpp:656] Trying to get 
'/mesos/test/info_06' in ZooKeeper
W0813 12:09:52.127071 107065344 detector.cpp:444] Leading master 
master@192.168.33.1:5053 is using a Protobuf binary format when registering 
with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see MESOS-2340)
I0813 12:09:52.127094 107065344 detector.cpp:481] A new leading master 
(UPID=master@192.168.33.1:5053) is detected
I0813 12:09:52.127187 103845888 master.cpp:1481] The newly elected leader is 
master@192.168.33.1:5053 with id 20150813-120952-18983104-5053-14072
I0813 12:09:52.127209 103845888 master.cpp:1494] Elected as the leading master!

At this point, I'm almost sure you're running up against some issue with the 
log-replica; but I'm the least competent guy here to help you on that one, 
hopefully someone else will be able to add insight here.
I start the services (zk, master, marathon; all on same host) by SSHing into 
the host  doing service  start commands.
Again, thanks very much; and more tomorrow.
Cordially,
Paul
On Thu, Aug 13, 2015 at 1:08 PM, haosdent haosd...@gmail.com wrote:
Hello, how you start the master? And could you try use netstat -antp|grep 
5050 to find whether there are multi master processes run at a same machine or 
not?
On Thu, Aug 13, 2015 at 10:37 PM, Paul Bell arach...@gmail.com wrote:
Hi All,
I hope someone can shed some light on this because I'm getting desperate!
I try to start components zk, mesos-master, and marathon in that order. They 
are started via a program that SSHs to the sole host and does service xxx 
start. Everyone starts happily enough. But the Mesos UI shows me:
This master is not the leader, redirecting in 0 seconds ... go now

The pattern seen in all of the mesos-master.INFO logs (one of which shown 
below) is that the mesos-master with the correct IP@ starts. But then a new 
leader is detected and becomes leading master. This new leader shows UPID 
(UPID=master@127.0.1.1:5050
I've tried clearing what ZK and mesos-master state I can find, but this problem 
will not go away.
Would someone be so kind as to a) explain what is happening here and b) suggest 
remedies?
Thanks very much