Hi there, I have a cluster with CDH5.1 running on top of Redhat6.5, where the default Python version is 2.6. I am trying to set up a proper iPython notebook environment to develop spark application using pyspark.
Here <http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/> is a tutorial that I have been following. However, it turned out that the author was using iPython1 where we have the latest Anaconda Python2.7 installed on our name node. When I finished following the tutorial, I can connect to the spark cluster but whenever I tried to distribute the work, it will errorred out and google tells me it is the difference between the version of Python across the cluster. Here are a few thoughts that I am planning to try. (1) remove the Anaconda Python from the namenode and install the iPython version that is compatible with Python2.6. (2) or I need to install Anaconda Python on every node and make it the default Python version across the whole cluster (however, I am not sure if this plan will totally screw up the existing environment since some running services are built by Python2.6...) Let me which should be the proper way to set up an iPython notebook environment. Best regards, Bin