Re: Pyspark UDF as a data source for streaming

Hyukjin Kwon Thu, 28 Dec 2023 01:33:17 -0800

Just fyi streaming python data source is in progress
https://github.com/apache/spark/pull/44416 we will likely release this in
spark 4.0


On Thu, Dec 28, 2023 at 4:53 PM Поротиков Станислав Вячеславович
<s.poroti...@skbkontur.ru.invalid> wrote:

> Yes, it's actual data.
>
>
>
> Best regards,
>
> Stanislav Porotikov
>
>
>
> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
> *Sent:* Wednesday, December 27, 2023 9:43 PM
> *Cc:* user@spark.apache.org
> *Subject:* Re: Pyspark UDF as a data source for streaming
>
>
>
> Is this generated data actual data or you are testing the application?
>
>
>
> Sounds like a form of Lambda architecture here with some
> decision/processing not far from the attached diagram
>
>
>
> HTH
>
>
> Mich Talebzadeh,
>
> Dad | Technologist | Solutions Architect | Engineer
>
> London
>
> United Kingdom
>
>
>
>  [image: Рисунок удален отправителем.]  view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Wed, 27 Dec 2023 at 13:26, Поротиков Станислав Вячеславович <
> s.poroti...@skbkontur.ru> wrote:
>
> Actually it's json with specific structure from API server.
>
> But the task is to check constantly if new data appears on API server and
> load it to Kafka.
>
> Full pipeline can be presented like that:
>
> REST API -> Kafka -> some processing -> Kafka/Mongo -> …
>
>
>
> Best regards,
>
> Stanislav Porotikov
>
>
>
> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
> *Sent:* Wednesday, December 27, 2023 6:17 PM
> *To:* Поротиков Станислав Вячеславович <s.poroti...@skbkontur.ru.invalid>
> *Cc:* user@spark.apache.org
> *Subject:* Re: Pyspark UDF as a data source for streaming
>
>
>
> Ok so you want to generate some random data and load it into Kafka on a
> regular interval and the rest?
>
>
>
> HTH
>
> Mich Talebzadeh,
>
> Dad | Technologist | Solutions Architect | Engineer
>
> London
>
> United Kingdom
>
>
>
>  [image: Рисунок удален отправителем.]  view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Wed, 27 Dec 2023 at 12:16, Поротиков Станислав Вячеславович
> <s.poroti...@skbkontur.ru.invalid> wrote:
>
> Hello!
>
> Is it possible to write pyspark UDF, generated data to streaming dataframe?
>
> I want to get some data from REST API requests in real time and consider
> to save this data to dataframe.
>
> And then put it to Kafka.
>
> I can't realise how to create streaming dataframe from generated data.
>
>
>
> I am new in spark streaming.
>
> Could you give me some hints?
>
>
>
> Best regards,
>
> Stanislav Porotikov
>
>
>
>

Re: Pyspark UDF as a data source for streaming

Reply via email to