Hey
 they are good libraries..to get you started. Have used both of them..
unfortunately -as far as i saw when i started to use them  - only few
people maintains them.
But you can get pointers out of them for writing tests. the code below can
get you started
What you'll need is

- a method to create dataframe on the fly, perhaps from  a string.  you can
have a look at pandas, it will have methods for it
- a method to test dataframe equality. you can use  df1.subtract(df2)

I am assuming you are into dataframes - rather than RDDs, for which the two
packages you mention  should have everything you need

hht
 marco


import logging
from pyspark.sql import SparkSession
from pyspark import HiveContext
from pyspark import SparkConf
from pyspark import SparkContext
import pyspark
from pyspark.sql import SparkSession
import pytest
import shutil

@pytest.fixture
def spark_session():
    return SparkSession.builder \
        .master('local[1]') \
        .appName('SparkByExamples.com') \
        .getOrCreate()


def test_create_table(spark_session):
    df = spark_session.createDataFrame([['one',
'two']]).toDF(*['first', 'second'])
    print(df.show())

    df2 = spark_session.createDataFrame([['one',
'two']]).toDF(*['first', 'second'])

    assert df.subtract(df2).count() == 0




On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka <connectsac...@gmail.com>
wrote:

> Hi Users,
>
> I have to write Unit Test cases for PySpark.
> I think pytest-spark and "spark testing base" are good test libraries.
>
> Can anyone please provide full reference for writing the test cases in
> Python using these?
>
> Kind Regards,
> Sachit Murarka
>

Reply via email to