Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sofia’s World
Hey Mich glad to know u got to the bottom In python, if you want to run a module - same as if you would use Java/Scala -you will have to define a def main() method You'll notice that the snippet i sent you had this syntax - if __name__ == "main": main() I am guessing you just choose an

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Mich Talebzadeh
Thanks all. Found out the problem :( I defined the runner.py as class main() I replaced it with def main(): and it worked without declaring numRows as global. I am still wondering the reason for it working with def main()? regards, Mich *Disclaimer:* Use it at your own risk. Any and

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sean Owen
I don't believe you'll be able to use globals in a Spark task, as they won't exist on the remote executor machines. On Sun, Dec 13, 2020 at 3:46 AM Mich Talebzadeh wrote: > thanks Marco. > > When I stripped down spark etc and ran your map, it came back OK (no > errors) WITHOUT global numRows >

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sofia’s World
Sure Mich...uhm...let me try to run your code in my IDE. .. I m intrigued by the error.. Will report back either if I find something or not. Kind regards On Sun, Dec 13, 2020, 9:46 AM Mich Talebzadeh wrote: > thanks Marco. > > When I stripped down spark etc and ran your map, it came back OK (no

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-12 Thread Sofia’s World
Hi Mich i dont think it's a good idea... I believe your IDE is playing tricks on you. Take spark out of the equation this is a python issue only. i am guessing your IDE is somehow messing up your environment. if you take out the whole spark code and replace it by this code map(lambda x:

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-12 Thread Mich Talebzadeh
I solved the issue of variable numRows within the lambda function not defined by defining it as a Global variable global numRows numRows = 10 ## do in increment of 50K rows otherwise you blow up driver memory! # Then I could call it within the lambda function as follows rdd =

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh
many thanks KR. If i call the clusterted function on its own it works numRows = 10 print(uf.clustered(200,numRows)) and returns 0.00199 If I run all in one including the UsedFunctions claa in the same py file it works. The code is attached However, in PyCharm, I do the following

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Sofia’s World
copying and pasting your code code in a jup notebook works fine. that is, using my own version of Range which is simply a list of numbers how bout this.. does this work fine? list(map(lambda x: (x, clustered(x, numRows)),[1,2,3,4])) If it does, i'd look in what's inside your Range and what you

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh
Sorry, part of the code is not that visible rdd = sc.parallelize(Range). \ map(lambda x: (x, uf.clustered(x, numRows), \ uf.scattered(x,1), \ uf.randomised(x,1), \ uf.randomString(50), \

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh
Thanks Sean, This is the code numRows = 10 ## do in increment of 50K rows otherwise you blow up driver memory! # ## Check if table exist otherwise create it rows = 0 sqltext = "" if (spark.sql(f"SHOW TABLES IN {DB} like '{tableName}'").count() == 1): rows = spark.sql(f"""SELECT

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Sean Owen
Looks like a simple Python error - you haven't shown the code that produces it. Indeed, I suspect you'll find there is no such symbol. On Fri, Dec 11, 2020 at 9:09 AM Mich Talebzadeh wrote: > Hi, > > This used to work but not anymore. > > I have UsedFunctions.py file that has these functions >

Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh
Hi, This used to work but not anymore. I have UsedFunctions.py file that has these functions import random import string import math def randomString(length): letters = string.ascii_letters result_str = ''.join(random.choice(letters) for i in range(length)) return result_str def