Hello
 my 2cents/./
well that will be an integ test to write to a 'dev' database. (which you
might pre-populate and clean up after your runs, so you can have repeatable
data).
then either you
1 - use normal sql and assert that the values you store in your dataframe
are the same as what you get from your sql
2 - surely , as there is a dataframe.write,  there would be also a
dataframe.read that you can use?


hth
 Marco



On Wed, Feb 3, 2021 at 4:51 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> It appears that the following assertion works assuming that result set can
> be = 0 (no data) or > 0 there is data
>
> assert df2.count() >= 0
>
> However, if I wanted to write to a JDBC database from PySpark through a
> function (already defined in another module) as below
>
>
> def writeTableToOracle(dataFrame,mode,dataset,tableName):
>
>     try:
>
>         dataFrame. \
>
>             write. \
>
>             format("jdbc"). \
>
>             option("url", oracle_url). \
>
>             option("dbtable", tableName). \
>
>             option("user", config['OracleVariables']['oracle_user']). \
>
>             option("password",
> config['OracleVariables']['oracle_password']). \
>
>             option("driver", config['OracleVariables']['oracle_driver']). \
>
>             mode(mode). \
>
>             save()
>
>     except Exception as e:
>
>         print(f"""{e}, quitting""")
>
>         sys.exit(1)
>
>
> and call it in the program
>
>
> from sparkutils import sparkstuff as s
>
>
> s.writeTableToOracle(df2,"overwrite",config['OracleVariables']['dbschema'],config['OracleVariables']['yearlyAveragePricesAllTable'])
>
>
> How can one assert its validity in PyTest?
>
>
> Thanks again
>
> On Wed, 3 Feb 2021 at 15:12, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In Pytest you want to ensure that the composed DF has the correct return.
>>
>> Example
>>
>>     df2 = house_df. \
>>         select( \
>>         F.date_format('datetaken', 'yyyy').cast("Integer").alias('YEAR') \
>>         , 'REGIONNAME' \
>>         ,
>> round(F.avg('averageprice').over(wSpecY)).alias('AVGPRICEPERYEAR') \
>>         ,
>> round(F.avg('flatprice').over(wSpecY)).alias('AVGFLATPRICEPERYEAR') \
>>         ,
>> round(F.avg('TerracedPrice').over(wSpecY)).alias('AVGTERRACEDPRICEPERYEAR')
>> \
>>         ,
>> round(F.avg('SemiDetachedPrice').over(wSpecY)).alias('AVGSDPRICEPRICEPERYEAR')
>> \
>>         ,
>> round(F.avg('DetachedPrice').over(wSpecY)).alias('AVGDETACHEDPRICEPERYEAR')).
>> \
>>         distinct().orderBy('datetaken', asending=True)
>>
>> Will that be enough to run just this command
>>
>>   assert not []
>>
>> I believe that may be flawed because any error will be assumed to be NOT
>> NULL?
>>
>> Thanks
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Reply via email to