Hi Satyajit,
For the query/join part there is a couple of approaches.
1. create a dataframe from all incoming streaming batch (i.e. actually an
rdd) and join with your reference data (coming from existing table) 2. you
can use structure streaming that basically consists of the schema in every
Hi Satyajit,
For the query/join part there is a couple of approaches.
1. create a dataframe from all incoming streaming batch (i.e. actually an
rdd) and join with your reference data (coming from existing table)
2. you can use structure streaming that basically consists of schema in
every batch
You can do a join between streaming dataset and a static dataset. I would
prefer your first approach. But the problem with this approach is
performance.
Unless you cache the dataset , every time you fire a join query it will
fetch the latest records from the table.
Regards,
Rishitesh Mishra,
Hi All,
I working on real time reporting project and i have a question about
structured streaming job, that is going to stream a particular table
records and would have to join to an existing table.
Stream > query/join to another DF/DS ---> update the Stream data record.
Now i have a