Re: Mesos coarse mode not working (fine grained does)
Bumping 1on1 conversation to mailinglist: On 10 Feb 2015, at 13:24, Hans van den Bogert hansbog...@gmail.com wrote: It’s self built, I can’t otherwise as I can’t install packages on the cluster here. The problem seems with libtool. When compiling Mesos on a host with apr-devel and apr-util-devel the shared libraries are named libapr*.so without prefix (the ones with prefix are also installed of course). On our compute nodes no *devel packages are installed, just the binary package, which have are named libapr*.so.0 . But even the “make install”-ed binaries still refer to the devel-packages’ shared library. I’m not sure if this is intended behaviour by libtool, because it is the one changing at start/runtime the binaries’ RPATH (which are initially well defined) to the libapr*.so. But this is probably autoconf fu, just hoping someone here has dealt with the same issue. On 09 Feb 2015, at 20:37, Tim Chen t...@mesosphere.io wrote: I'm still trying to grasp what your environment setup is like, it's odd to see a g++ stderr when you running mesos. Are you building Mesos yourself and running it, or you've installed it through some package? Tim On Mon, Feb 9, 2015 at 1:03 AM, Hans van den Bogert hansbog...@gmail.com wrote: Okay, I was kind of ambiguous, I assume you mean this one: [vdbogert@node002 ~]$ cat /local/vdbogert/var/lib/mesos/slaves/20150206-110658-16813322-5050-5515-S0/frameworks/20150208-200943-16813322-5050-26370-/executors/3/runs/latest/stdout [vdbogert@node002 ~]$ it’s empty. On 09 Feb 2015, at 06:22, Tim Chen t...@mesosphere.io wrote: Hi Hans, I was referring to the stdout/stderr of the task, not the slave. Tim On Sun, Feb 8, 2015 at 1:21 PM, Hans van den Bogert hansbog...@gmail.com wrote: Hi there, It looks like while trying to launch the executor (or one of the process like the fetcher to fetch the uris) The fetching seems to have succeeded as well as extracting, as the “spark-1.2.0-bin-hadoop2.4” directory exists in the slave sandbox. Furthermore, it seems the executor URI is superfluous in my environment as I’ve checked the code, and if an URI is not provided, the task will not refer to an extracted distro, but to a directory with the same path as the current spark distro, which makes sense in a cluster environment where data is on a network-shared disk. I’ve tried *not* supplying an spark.executor.uri and fine-grained mode still works fine. Coarse-grained mode still fails with the same libapr* errors. was failing because of the dependencies problem you see. Your mesos-slave shouldn't be able to run though, were you running 0.20.0 slave and upgraded to 0.21.0? We introduced the dependencies for libapr and libsvn for Mesos 0.21.0. I’ve only ever tried compiling 0.21.0. I’ve checked all binaries in MESOS_HOME/build/src/.libs with ‘ldd’ and all are referring to a correct existing libapr*-1.so.0 (mind the trailing “.0”). What's the stdout for the task like? Mesos slaves' stdout are empty. It’s a pity spark’s logging in this case is pretty marginal, as is mesos’. One can’t log the (raw) task-descriptions as far as I can see, which would be very helpful in this case. I could resort to building spark from source as well and add some logging, but I’m afraid I will introduce other peculiarities. Do you think it’s my only option? Thanks, H. Tim On Mon, Feb 9, 2015 at 4:10 AM, Hans van den Bogert hansbog...@gmail.com wrote: I wasn’t thorough, the complete stderr includes: g++: /usr/lib64/libaprutil-1.so: No such file or directory g++: /usr/lib64/libapr-1.so: No such file or directoryn (including that trailing ’n') Though I can’t figure out how the process indirection is going from the frontend spark application to mesos executors and where this shared library error comes from. Hope someone can shed some light, Thanks On 08 Feb 2015, at 14:15, Hans van den Bogert hansbog...@gmail.com wrote: Hi, I’m trying to get coarse mode to work under mesos(0.21.0), I thought this would be a trivial change as Mesos was working well in fine-grained mode. However the mesos tasks fail, I can’t pinpoint where things go wrong. This is a mesos stderr log from a slave: Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource
Mesos coarse mode not working (fine grained does)
Hi, I’m trying to get coarse mode to work under mesos(0.21.0), I thought this would be a trivial change as Mesos was working well in fine-grained mode. However the mesos tasks fail, I can’t pinpoint where things go wrong. This is a mesos stderr log from a slave: Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' into '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’ Mesos slaves' stdout are empty. And I can confirm the spark distro is correctly extracted: $ ls spark-1.2.0-bin-hadoop2.4 spark-1.2.0-bin-hadoop2.4.tgz stderr stdout The spark-submit log is here: http://pastebin.com/ms3uZ2BK Mesos-master http://pastebin.com/QH2Vn1jX Mesos-slave http://pastebin.com/DXFYemix Can somebody pinpoint me to logs, etc to further investigate this, I’m feeling kind of blind. Furthermore, do the executors on mesos inherit all configs from the spark application/submit? E.g. I’ve given my executors 20GB of memory through a spark-submit —conf” parameter. Should these settings also be present in the spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs? If, in order to be helped here, I need to present more logs etc, please let me know. Regards, Hans van den Bogert - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Mesos coarse mode not working (fine grained does)
I wasn’t thorough, the complete stderr includes: g++: /usr/lib64/libaprutil-1.so: No such file or directory g++: /usr/lib64/libapr-1.so: No such file or directoryn (including that trailing ’n') Though I can’t figure out how the process indirection is going from the frontend spark application to mesos executors and where this shared library error comes from. Hope someone can shed some light, Thanks On 08 Feb 2015, at 14:15, Hans van den Bogert hansbog...@gmail.com wrote: Hi, I’m trying to get coarse mode to work under mesos(0.21.0), I thought this would be a trivial change as Mesos was working well in fine-grained mode. However the mesos tasks fail, I can’t pinpoint where things go wrong. This is a mesos stderr log from a slave: Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' into '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’ Mesos slaves' stdout are empty. And I can confirm the spark distro is correctly extracted: $ ls spark-1.2.0-bin-hadoop2.4 spark-1.2.0-bin-hadoop2.4.tgz stderr stdout The spark-submit log is here: http://pastebin.com/ms3uZ2BK Mesos-master http://pastebin.com/QH2Vn1jX Mesos-slave http://pastebin.com/DXFYemix Can somebody pinpoint me to logs, etc to further investigate this, I’m feeling kind of blind. Furthermore, do the executors on mesos inherit all configs from the spark application/submit? E.g. I’ve given my executors 20GB of memory through a spark-submit —conf” parameter. Should these settings also be present in the spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs? If, in order to be helped here, I need to present more logs etc, please let me know. Regards, Hans van den Bogert - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Mesos coarse mode not working (fine grained does)
Hi there, It looks like while trying to launch the executor (or one of the process like the fetcher to fetch the uris) was failing because of the dependencies problem you see. Your mesos-slave shouldn't be able to run though, were you running 0.20.0 slave and upgraded to 0.21.0? We introduced the dependencies for libapr and libsvn for Mesos 0.21.0. What's the stdout for the task like? Tim On Mon, Feb 9, 2015 at 4:10 AM, Hans van den Bogert hansbog...@gmail.com wrote: I wasn’t thorough, the complete stderr includes: g++: /usr/lib64/libaprutil-1.so: No such file or directory g++: /usr/lib64/libapr-1.so: No such file or directoryn (including that trailing ’n') Though I can’t figure out how the process indirection is going from the frontend spark application to mesos executors and where this shared library error comes from. Hope someone can shed some light, Thanks On 08 Feb 2015, at 14:15, Hans van den Bogert hansbog...@gmail.com wrote: Hi, I’m trying to get coarse mode to work under mesos(0.21.0), I thought this would be a trivial change as Mesos was working well in fine-grained mode. However the mesos tasks fail, I can’t pinpoint where things go wrong. This is a mesos stderr log from a slave: Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading ' http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz' into '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’ Mesos slaves' stdout are empty. And I can confirm the spark distro is correctly extracted: $ ls spark-1.2.0-bin-hadoop2.4 spark-1.2.0-bin-hadoop2.4.tgz stderr stdout The spark-submit log is here: http://pastebin.com/ms3uZ2BK Mesos-master http://pastebin.com/QH2Vn1jX Mesos-slave http://pastebin.com/DXFYemix Can somebody pinpoint me to logs, etc to further investigate this, I’m feeling kind of blind. Furthermore, do the executors on mesos inherit all configs from the spark application/submit? E.g. I’ve given my executors 20GB of memory through a spark-submit —conf” parameter. Should these settings also be present in the spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs? If, in order to be helped here, I need to present more logs etc, please let me know. Regards, Hans van den Bogert - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org