Hello my 2cents/./ well that will be an integ test to write to a 'dev' database. (which you might pre-populate and clean up after your runs, so you can have repeatable data). then either you 1 - use normal sql and assert that the values you store in your dataframe are the same as what you get from your sql 2 - surely , as there is a dataframe.write, there would be also a dataframe.read that you can use?
hth Marco On Wed, Feb 3, 2021 at 4:51 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > It appears that the following assertion works assuming that result set can > be = 0 (no data) or > 0 there is data > > assert df2.count() >= 0 > > However, if I wanted to write to a JDBC database from PySpark through a > function (already defined in another module) as below > > > def writeTableToOracle(dataFrame,mode,dataset,tableName): > > try: > > dataFrame. \ > > write. \ > > format("jdbc"). \ > > option("url", oracle_url). \ > > option("dbtable", tableName). \ > > option("user", config['OracleVariables']['oracle_user']). \ > > option("password", > config['OracleVariables']['oracle_password']). \ > > option("driver", config['OracleVariables']['oracle_driver']). \ > > mode(mode). \ > > save() > > except Exception as e: > > print(f"""{e}, quitting""") > > sys.exit(1) > > > and call it in the program > > > from sparkutils import sparkstuff as s > > > s.writeTableToOracle(df2,"overwrite",config['OracleVariables']['dbschema'],config['OracleVariables']['yearlyAveragePricesAllTable']) > > > How can one assert its validity in PyTest? > > > Thanks again > > On Wed, 3 Feb 2021 at 15:12, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Hi, >> >> In Pytest you want to ensure that the composed DF has the correct return. >> >> Example >> >> df2 = house_df. \ >> select( \ >> F.date_format('datetaken', 'yyyy').cast("Integer").alias('YEAR') \ >> , 'REGIONNAME' \ >> , >> round(F.avg('averageprice').over(wSpecY)).alias('AVGPRICEPERYEAR') \ >> , >> round(F.avg('flatprice').over(wSpecY)).alias('AVGFLATPRICEPERYEAR') \ >> , >> round(F.avg('TerracedPrice').over(wSpecY)).alias('AVGTERRACEDPRICEPERYEAR') >> \ >> , >> round(F.avg('SemiDetachedPrice').over(wSpecY)).alias('AVGSDPRICEPRICEPERYEAR') >> \ >> , >> round(F.avg('DetachedPrice').over(wSpecY)).alias('AVGDETACHEDPRICEPERYEAR')). >> \ >> distinct().orderBy('datetaken', asending=True) >> >> Will that be enough to run just this command >> >> assert not [] >> >> I believe that may be flawed because any error will be assumed to be NOT >> NULL? >> >> Thanks >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >