@chris, a couple of questions:
1.  --spark_home=/usr/iop/current/spark-client, is this a full spark
distribution? The name seems to imply otherwise.
2. Can you check the environment variables you are running the install
environment. Want to check and make sure PYTHONPATH isn't being set and
causing the weird behavior Chip mentioned above.

On Wed, Dec 21, 2016 at 3:11 AM chris snow <chsnow...@gmail.com> wrote:

> Hi Chip,
>
> Thanks for the response.
>
> Is this a defect with toree, or have I misconfigured?
>
> Many thanks,
>
> Chris
>
> On 15 December 2016 at 19:14, Chip Senkbeil <chip.senkb...@gmail.com>
> wrote:
>
> > It's showing your PYTHONPATH as
> > /disk3/local/filecache/103/spark-assembly.jar. Toree is looking for
> > pyspark
> > on your PYTHONPATH.
> >
> > https://github.com/apache/incubator-toree/blob/master/
> > pyspark-interpreter/src/main/scala/org/apache/toree/kernel/
> > interpreter/pyspark/PySparkProcess.scala#L78
> >
> > That code is showing us augmenting the existing PYTHONPATH to include
> > $SPARK_HOME/python/, where we are searching for your pyspark
> distribution.
> >
> > Your PYTHONPATH isn't even showing us adding the $SPARK_HOME/python/,
> which
> > is also troubling.
> >
> > On Wed, Dec 14, 2016 at 9:41 AM chris snow <chsnow...@gmail.com> wrote:
> >
> > > I'm trying to setup toree as follows:
> > >
> > >     CLUSTER_NAME=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET
> > > https://${BI_HOST}:9443/api/v1/clusters
> > > | python -c 'import sys, json;
> > > print(json.load(sys.stdin)["items"][0]["Clusters"]["cluster_name"]);')
> > >     echo Cluster Name: $CLUSTER_NAME
> > >
> > >     CLUSTER_HOSTS=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET
> > > https://${BI_HOST}:9443/api/v1/clusters/${CLUSTER_NAME}/hosts
> > > | python -c 'import sys, json; items = json.load(sys.stdin)["items"];
> > hosts
> > > = [ item["Hosts"]["host_name"] for item in items ]; print("
> > > ".join(hosts));')
> > >     echo Cluster Hosts: $CLUSTER_HOSTS
> > >
> > >     wget -c
> > > https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh
> > >
> > >     # Install anaconda if it isn't already installed
> > >     [[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b
> > >
> > >     # check toree is available, if not install it
> > >     ./anaconda2/bin/python -c 'import toree' || ./anaconda2/bin/pip
> > install
> > > toree
> > >
> > >     # Install toree
> > >     ./anaconda2/bin/jupyter toree install \
> > >             --spark_home=/usr/iop/current/spark-client/ \
> > >             --user --interpreters Scala,PySpark,SparkR  \
> > >             --spark_opts="--master yarn" \
> > >             --python_exec=${HOME}/anaconda2/bin/python2.7
> > >
> > >     # Install anaconda on all of the cluster nodes
> > >     for CLUSTER_HOST in ${CLUSTER_HOSTS};
> > >     do
> > >        if [[ "$CLUSTER_HOST" != "$BI_HOST" ]];
> > >        then
> > >           echo "*** Processing $CLUSTER_HOST ***"
> > >           ssh $BI_USER@$CLUSTER_HOST "wget -q -c
> > > https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh";
> > >           ssh $BI_USER@$CLUSTER_HOST "[[ -d anaconda2 ]] || bash
> > > Anaconda2-4.1.1-Linux-x86_64.sh -b"
> > >
> > >           # You can install your pip modules on each node using
> something
> > > like this:
> > >           # ssh $BI_USER@$CLUSTER_HOST "${HOME}/anaconda2/bin/python
> -c
> > > 'import yourlibrary' || ${HOME}/anaconda2/pip install yourlibrary"
> > >        fi
> > >     done
> > >
> > >     echo 'Finished installing'
> > >
> > > However, when I try to run a pyspark job I get the following error:
> > >
> > >     Name: org.apache.toree.interpreter.broker.BrokerException
> > >     Message: Py4JJavaError: An error occurred while calling
> > > z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> > >     : org.apache.spark.SparkException: Job aborted due to stage
> failure:
> > > Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3
> in
> > > stage 0.0 (TID 6, bi4c-xxxx-data-3.bi.services.bluemix.net):
> > > org.apache.spark.SparkException:
> > >     Error from python worker:
> > >       /home/biadmin/anaconda2/bin/python2.7: No module named pyspark
> > >     PYTHONPATH was:
> > >       /disk3/local/filecache/103/spark-assembly.jar
> > >     java.io.EOFException
> > >
> > > Any ideas what is going wrong?
> > >
> >
>

Reply via email to