Hi Mich
i dont think it's a good idea... I believe your IDE is playing tricks on
you.
Take spark out of the equation this is a python issue only.
i am guessing your IDE is somehow messing up your environment.
if you take out the whole spark code and replace it by this code
map(lambda x:
I solved the issue of variable numRows within the lambda function not
defined by defining it as a Global variable
global numRows
numRows = 10 ## do in increment of 50K rows otherwise you blow up
driver memory!
#
Then I could call it within the lambda function as follows
rdd =