Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Mich Talebzadeh
Many thanks Marco. Points noted and other points/criticism are equally welcome. In a forum like this we do not disagree, we just agree to differ so to speak and share ideas. I will review my code and take onboard your suggestions. regards, Mich LinkedIn *

Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Sofia’s World
Hey Mich my 2 cents on top of Jerry's. for reusable @fixtures across your tests, i'd leverage conftest.py and put all of them there -if number is not too big. OW. as you say, you can create tests\fixtures where you place all of them there in term of extractHiveDAta for a @fixture it is

Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Mich Talebzadeh
Interesting points Jerry. I do not know how much atomising the unit test brings benefit. For example we have @pytest.fixture(scope = "session") def extractHiveData(): # read data through jdbc from Hive spark_session = s.spark_session(ctest['common']['appName']) tableName =

Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Jerry Vinokurov
Sure, I think it makes sense in many cases to break things up like this. Looking at your other example I'd say that you might want to break up extractHiveData into several fixtures (one for session, one for config, one for the df) because in my experience fixtures like those are reused constantly

Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Mich Talebzadeh
Thanks Jerry for your comments. The easiest option and I concur is to have all these fixture files currently under fixtures package lumped together in conftest.py under * tests* package. Then you can get away all together from fixtures and it works. However, I gather plug and play becomes less

Re: Testing ETL with Spark using Pytest

2021-02-09 Thread Jerry Vinokurov
Hi Mich, I'm a bit confused by what you mean when you say that you cannot call a fixture in another fixture. The fixtures resolve dependencies among themselves by means of their named parameters. So that means that if I have a fixture @pytest.fixture def fixture1(): return SomeObj() and

Testing ETL with Spark using Pytest

2021-02-09 Thread Mich Talebzadeh
I was a bit confused with the use of fixtures in Pytest with the dataframes passed as an input pipeline from one fixture to another. I wrote this after spending some time on it. As usual it is heuristic rather than anything overtly by the book so to speak. In PySpark and PyCharm you can ETTL from