Dependencies with runing Spark Streaming on Mesos cluster using Python

Luke Adolph Wed, 13 Jul 2016 03:58:04 -0700

Hi all:
My spark runs on mesos.I write a spark streaming app using python, code on
GitHub <https://github.com/adolphlwq/linkerProcessorSample>.


The app has dependency "*org.apache.spark:spark-streaming-kafka_2.10:1.6.1*
".

Spark on mesos has two important concepts: Spark Framework and Spark
exector.

I set my exector run in docker image.The docker image Dockerfile
<https://github.com/adolphlwq/linkerProcessorSample/blob/master/docker/Dockerfile>
is
below:

# refer '
> http://spark.apache.org/docs/latest/running-on-mesos.html#spark-properties'
> on 'spark.mesos.executor.docker.image' section

FROM ubuntu:14.04
> WORKDIR /linker
> RUN ln -f -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
> #download mesos
> RUN echo "deb http://repos.mesosphere.io/ubuntu/ trusty main" >
> /etc/apt/sources.list.d/mesosphere.list && \
>     apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF && \
>     apt-get update && \
>     apt-get -y install mesos=0.28.1-2.0.20.ubuntu1404 openjdk-7-jre
> python-pip git vim curl
> RUN git clone https://github.com/adolphlwq/linkerProcessorSample.git && \
>     pip install -r linkerProcessorSample/docker/requirements.txt
> RUN curl -fL
> http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
> | tar xzf - -C /usr/local && \
>     apt-get clean
> ENV MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so \
>     JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 \
>     SPARK_HOME=/usr/local/spark-1.6.0-bin-hadoop2.6
> ENV PATH=$JAVA_HOME/bin:$PATH
> WORKDIR $SPARK_HOME


When I use below command to submit my app program:

dcos spark run --submit-args='--packages
> org.apache.spark:spark-streaming-kafka_2.10:1.6.1 \
>        spark2cassandra.py zk topic' \
>         -docker-image=adolphlwq/mesos-for-spark-exector-image:1.6.0.beta


The exector docker container run successfully, but it has no package for
*org.apache.spark:spark-streaming-kafka_2.10:1.6.1*.

The *stderr* om mesos is:

I0713 09:34:52.715551 18124 logging.cpp:188] INFO level logging started!
> I0713 09:34:52.717797 18124 fetcher.cpp:424] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/home\/ubuntu\/spark2cassandra.py"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/com.101tec_zkclient-0.3.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.apache.kafka_kafka_2.10-0.8.2.1.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.slf4j_slf4j-api-1.7.10.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.spark-project.spark_unused-1.0.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/net.jpountz.lz4_lz4-1.3.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/log4j_log4j-1.2.17.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/com.yammer.metrics_metrics-core-2.2.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.apache.kafka_kafka-clients-0.8.2.1.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.xerial.snappy_snappy-java-1.1.2.jar"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4\/frameworks\/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024\/executors\/driver-20160713093451-0015\/runs\/84419372-9482-4c58-8f87-4ba528b6885c"}
> I0713 09:34:52.719846 18124 fetcher.cpp:379] Fetching URI
> '/home/ubuntu/spark2cassandra.py'
> I0713 09:34:52.719866 18124 fetcher.cpp:250] Fetching directly into the
> sandbox directory
> I0713 09:34:52.719925 18124 fetcher.cpp:187] Fetching URI
> '/home/ubuntu/spark2cassandra.py'
> I0713 09:34:52.719945 18124 fetcher.cpp:167] Copying resource with
> command:cp '/home/ubuntu/spark2cassandra.py'
> '/tmp/mesos/slaves/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4/frameworks/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024/executors/driver-20160713093451-0015/runs/84419372-9482-4c58-8f87-4ba528b6885c/spark2cassandra.py'
> W0713 09:34:52.722587 18124 fetcher.cpp:272] Copying instead of extracting
> resource from URI with 'extract' flag, because it does not seem to be an
> archive: /home/ubuntu/spark2cassandra.py
> I0713 09:34:52.724138 18124 fetcher.cpp:456] Fetched
> '/home/ubuntu/spark2cassandra.py' to
> '/tmp/mesos/slaves/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4/frameworks/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024/executors/driver-20160713093451-0015/runs/84419372-9482-4c58-8f87-4ba528b6885c/spark2cassandra.py'
> I0713 09:34:52.724148 18124 fetcher.cpp:379] Fetching URI
> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
> I0713 09:34:52.724153 18124 fetcher.cpp:250] Fetching directly into the
> sandbox directory
> I0713 09:34:52.724162 18124 fetcher.cpp:187] Fetching URI
> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
> I0713 09:34:52.724171 18124 fetcher.cpp:167] Copying resource with
> command:cp
> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
> '/tmp/mesos/slaves/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4/frameworks/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024/executors/driver-20160713093451-0015/runs/84419372-9482-4c58-8f87-4ba528b6885c/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
> cp: cannot stat
> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar':
> No such file or directory
> Failed to fetch
> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar':
> Failed to copy with command 'cp
> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'


May somebody help me to solve the dependencies problem!

Thanks.
-- 
Thanks & Best Regards
卢文泉 | Adolph Lu
TEL：+86 15651006559
Linker Networks(http://www.linkernetworks.com/)

Dependencies with runing Spark Streaming on Mesos cluster using Python

Reply via email to