Re: Online classes for spark topics

2023-03-08 Thread Sofia’s World
+1 On Wed, Mar 8, 2023 at 10:40 PM Winston Lai wrote: > +1, any webinar on Spark related topic is appreciated  > > Thank You & Best Regards > Winston Lai > -- > *From:* asma zgolli > *Sent:* Thursday, March 9, 2023 5:43:06 AM > *To:* karan alang > *Cc:* Mich

Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Sofia’s World
Hey Mich my 2 cents on top of Jerry's. for reusable @fixtures across your tests, i'd leverage conftest.py and put all of them there -if number is not too big. OW. as you say, you can create tests\fixtures where you place all of them there in term of extractHiveDAta for a @fixture it is

Re: Assertion of return value of dataframe in pytest

2021-02-03 Thread Sofia’s World
Hello my 2cents/./ well that will be an integ test to write to a 'dev' database. (which you might pre-populate and clean up after your runs, so you can have repeatable data). then either you 1 - use normal sql and assert that the values you store in your dataframe are the same as what you get

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sofia’s World
;> Mich >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sofia’s World
all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction.

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-12 Thread Sofia’s World
rt + numRows - 1 >> >> print ("starting at ID = ",start, ",ending on = ",end) >> >> Range = range(start, end+1) >> >> ## This traverses through the Range and increment "x" by one unit each >> time, and that x value i

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Sofia’s World
copying and pasting your code code in a jup notebook works fine. that is, using my own version of Range which is simply a list of numbers how bout this.. does this work fine? list(map(lambda x: (x, clustered(x, numRows)),[1,2,3,4])) If it does, i'd look in what's inside your Range and what you

Re: Need Unit test complete reference for Pyspark

2020-11-19 Thread Sofia’s World
Hey they are good libraries..to get you started. Have used both of them.. unfortunately -as far as i saw when i started to use them - only few people maintains them. But you can get pointers out of them for writing tests. the code below can get you started What you'll need is - a method to

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Sofia’s World
Hey My 2 cents on CI/Cd for pyspark. You can leverage pytests + holden karau's spark testing libs for CI thus giving you `almost` same functionality as Scala - I say almost as in Scala you have nice and descriptive funcspecs - For me choice is based on expertise.having worked with teams which