Just fyi streaming python data source is in progress https://github.com/apache/spark/pull/44416 we will likely release this in spark 4.0
On Thu, Dec 28, 2023 at 4:53 PM Поротиков Станислав Вячеславович <s.poroti...@skbkontur.ru.invalid> wrote: > Yes, it's actual data. > > > > Best regards, > > Stanislav Porotikov > > > > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Sent:* Wednesday, December 27, 2023 9:43 PM > *Cc:* user@spark.apache.org > *Subject:* Re: Pyspark UDF as a data source for streaming > > > > Is this generated data actual data or you are testing the application? > > > > Sounds like a form of Lambda architecture here with some > decision/processing not far from the attached diagram > > > > HTH > > > Mich Talebzadeh, > > Dad | Technologist | Solutions Architect | Engineer > > London > > United Kingdom > > > > [image: Рисунок удален отправителем.] view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > > > On Wed, 27 Dec 2023 at 13:26, Поротиков Станислав Вячеславович < > s.poroti...@skbkontur.ru> wrote: > > Actually it's json with specific structure from API server. > > But the task is to check constantly if new data appears on API server and > load it to Kafka. > > Full pipeline can be presented like that: > > REST API -> Kafka -> some processing -> Kafka/Mongo -> … > > > > Best regards, > > Stanislav Porotikov > > > > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Sent:* Wednesday, December 27, 2023 6:17 PM > *To:* Поротиков Станислав Вячеславович <s.poroti...@skbkontur.ru.invalid> > *Cc:* user@spark.apache.org > *Subject:* Re: Pyspark UDF as a data source for streaming > > > > Ok so you want to generate some random data and load it into Kafka on a > regular interval and the rest? > > > > HTH > > Mich Talebzadeh, > > Dad | Technologist | Solutions Architect | Engineer > > London > > United Kingdom > > > > [image: Рисунок удален отправителем.] view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > > > On Wed, 27 Dec 2023 at 12:16, Поротиков Станислав Вячеславович > <s.poroti...@skbkontur.ru.invalid> wrote: > > Hello! > > Is it possible to write pyspark UDF, generated data to streaming dataframe? > > I want to get some data from REST API requests in real time and consider > to save this data to dataframe. > > And then put it to Kafka. > > I can't realise how to create streaming dataframe from generated data. > > > > I am new in spark streaming. > > Could you give me some hints? > > > > Best regards, > > Stanislav Porotikov > > > >