Re: collect() works, take() returns ImportError: No module named iter

2015-08-13 Thread YaoPau
In case anyone runs into this issue in the future, we got it working: the following variable must be set on the edge node: export PYSPARK_PYTHON=/your/path/to/whatever/python/you/want/to/run/bin/python I didn't realize that variable gets passed to every worker node. All I saw when searching for

Re: collect() works, take() returns ImportError: No module named iter

2015-08-10 Thread Ruslan Dautkhanov
There is was a similar problem reported before on this list. Weird python errors like this generally mean you have different versions of python in the nodes of your cluster. Can you check that? From error stack you use 2.7.10 |Anaconda 2.3.0 while OS/CDH version of Python is probably 2.6. --

Re: collect() works, take() returns ImportError: No module named iter

2015-08-10 Thread Davies Liu
Is it possible that you have Python 2.7 on the driver, but Python 2.6 on the workers?. PySpark requires that you have the same minor version of Python in both driver and worker. In PySpark 1.4+, it will do this check before run any tasks. On Mon, Aug 10, 2015 at 2:53 PM, YaoPau

Re: collect() works, take() returns ImportError: No module named iter

2015-08-10 Thread Jon Gregg
We did have 2.7 on the driver, 2.6 on the edge nodes and figured that was the issue, so we've tried many combinations since then with all three of 2.6.6, 2.7.5, and Anaconda's 2.7.10 on each node with different PATHs and PYTHONPATHs each time. Every combination has produced the same error. We