Hi there,

I have a cluster with CDH5.1 running on top of Redhat6.5, where the default
Python version is 2.6. I am trying to set up a proper iPython notebook
environment to develop spark application using pyspark.

Here
<http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/>
is a tutorial that I have been following. However, it turned out that the
author was using iPython1 where we have the latest Anaconda Python2.7
installed on our name node. When I finished following the tutorial, I can
connect to the spark cluster but whenever I tried to distribute the work,
it will errorred out and google tells me it is the difference between the
version of Python across the cluster.

Here are a few thoughts that I am planning to try.
(1) remove the Anaconda Python from the namenode and install the iPython
version that is compatible with Python2.6.
(2) or I need to install Anaconda Python on every node and make it the
default Python version across the whole cluster (however, I am not sure if
this plan will totally screw up the existing environment since some running
services are built by Python2.6...)

Let me which should be the proper way to set up an iPython notebook
environment.

Best regards,

Bin

Reply via email to