Hard to answer in a succinct manner but I'll give it a shot. Cucumber is a tool for writing *Behaviour* Driven Tests (closely related to behaviour driven development, BDD). It is not a mere *technical* approach to testing but a mindset, a way of work and a different (different, whether it is better is a matter of controversy) way to structure communication between product and R&D.
I will not elaborate more as there is plenty of material out there if you want to educate yourself. Just bear in mind that BDD is riddled with misconception. Most often than not I see people just using Cucumber, but not doing actual BDD. Regarding unit testing, I do not consider the code you showed to be a good candidate for unit testing. There is very little procedural logic there and there is a good chance that if you go about unit testing it you will end up with lots and lots of mocks overly bound to the implementation details of the suit under test , rendering the tests unmaintainable and brittle. I would argue that unit tests are more appropriate for code that is algorithmic in nature, that has no or very little dependencies and where you have an absolute oracle of truth regrading your expectations from it. I think that in your situation going for integration tests (on small scale data) and regression tests would give you the most ROI. On Thu, Nov 15, 2018 at 8:43 PM ☼ R Nair <ravishankar.n...@gmail.com> wrote: > Sparklens from qubole is a good source. Other tests are to be handled by > developer. > > Best, > Ravi > > On Thu, Nov 15, 2018, 12:45 PM <omer.ozsaka...@sony.com wrote: > >> Hi all, >> >> >> >> How are you testing your Spark applications? >> >> We are writing features by using Cucumber. This is testing the >> behaviours. Is this called functional test or integration test? >> >> >> >> We are also planning to write unit tests. >> >> >> >> For instance we have a class like below. It has one method. This methos >> is implementing several things: like DataFrame operations, saving DataFrame >> into database table, insert, update,delete statements. >> >> >> >> Our classes generally contains 2 or 3 methods. These methods cover a lot >> of tasks in the same function defintion. (like the function below) >> >> So I am not sure how I can write unit tests for these classes and methods. >> >> Do you have any suggestion? >> >> >> >> class CustomerOperations >> >> >> >> def doJob(inputDataFrame : DataFrame) = { >> >> // definitions (value/variable) >> >> // spark context, session etc definition >> >> >> >> // filtering, cleansing on inputDataframe and save results on >> a new dataframe >> >> // insert new dataframe to a database table >> >> // several insert/update/delete statements on the database >> tables >> >> >> >> } >> >> >> >> >> >