date:20240109

Unsubscribe

2024-01-09 Thread qi bryce

Unsubscribe

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread Mich Talebzadeh

Hi Ashok,

Thanks for pointing out the databricks article Scalable Spark Structured
Streaming for REST API Destinations | Databricks Blog

I browsed it and it is basically similar to many of us involved with spark
structure streaming with *foreachBatch. *This article and mine both mention
REST API as part of the architecture. However, there are notable
differences I believe.

In my proposed approach:

   1. Event-Driven Model:

   - Spark Streaming waits until Flask REST API makes a request for events
   to be generated within PySpark.
   - Messages are generated and then fed into any sink based on the Flask
   REST API's request.
   - This creates a more event-driven model where Spark generates data when
   prompted by external requests.

In the Databricks article scenario:

Continuous Data Stream:

   - There is an incoming stream of data from sources like Kafka, AWS
   Kinesis, or Azure Event Hub handled by foreachBatch
   - As messages flow off this stream, calls are made to a REST API with
   some or all of the message data.
   - This suggests a continuous flow of data where messages are sent to a
   REST API as soon as they are available in the streaming source.

*Benefits of Event-Driven Model:*

   1. Responsiveness: Ideal for scenarios where data generation needs to be
   aligned with specific events or user actions.
   2. Resource Optimization: Can reduce resource consumption by processing
   data only when needed.
   3. Flexibility: Allows for dynamic control over data generation based on
   external triggers.

*Benefits of Continuous Data Stream Mode with foreachBatch:*

   1. Real-Time Processing: Facilitates immediate analysis and action on
   incoming data.
   2. Handling High Volumes: Well-suited for scenarios with
   continuous, high-volume data streams.
   3. Low-Latency Applications: Essential for applications requiring near
   real-time responses.

*Potential Use Cases for my approach:*

   - On-Demand Data Generation: Generating data for
   simulations, reports, or visualizations based on user requests.
   - Triggered Analytics: Executing specific analytics tasks only when
   certain events occur, such as detecting anomalies or reaching thresholds
   say fraud detection.
   - Custom ETL Processes: Facilitating data
   extraction, transformation, and loading workflows based on external events
   or triggers

Something to note on latency. Event-driven models like mine can potentially
introduce slight latency compared to continuous processing, as data
generation depends on API calls.

So my approach is more event-triggered and responsive to external requests,
while foreachBatch scenario is more continuous and real-time, processing
and sending data as it becomes available.

In summary, both approaches have their merits and are suited to different
use cases depending on the nature of the data flow and processing
requirements.

Cheers

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom

   view my Linkedin profile

 https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Tue, 9 Jan 2024 at 19:11, ashok34...@yahoo.com 
wrote:

> Hey Mich,
>
> Thanks for this introduction on your forthcoming proposal "Spark
> Structured Streaming and Flask REST API for Real-Time Data Ingestion and
> Analytics". I recently came across an article by Databricks with title 
> Scalable
> Spark Structured Streaming for REST API Destinations
> 
> . Their use case is similar to your suggestion but what they are saying
> is that they have incoming stream of data from sources like Kafka, AWS
> Kinesis, or Azure Event Hub. In other words, a continuous flow of data
> where messages are sent to a REST API as soon as they are available in the
> streaming source. Their approach is practical but wanted to get your
> thoughts on their article with a better understanding on your proposal and
> differences.
>
> Thanks
>
>
> On Tuesday, 9 January 2024 at 00:24:19 GMT, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> Please also note that Flask, by default, is a single-threaded web
> framework. While it is suitable for development and small-scale
> applications, it may not handle concurrent requests efficiently in a
> production environment.
> In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI (
> Web Server Gateway Interface) that is commonly used to serve Flask
> applications in

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread ashok34...@yahoo.com.INVALID

 Hey Mich,
Thanks for this introduction on your forthcoming proposal "Spark Structured 
Streaming and Flask REST API for Real-Time Data Ingestion and Analytics". I 
recently came across an article by Databricks with title Scalable Spark 
Structured Streaming for REST API Destinations. Their use case is similar to 
your suggestion but what they are saying is that they have incoming stream of 
data from sources like Kafka, AWS Kinesis, or Azure Event Hub. In other words, 
a continuous flow of data where messages are sent to a REST API as soon as they 
are available in the streaming source. Their approach is practical but wanted 
to get your thoughts on their article with a better understanding on your 
proposal and differences.
Thanks

On Tuesday, 9 January 2024 at 00:24:19 GMT, Mich Talebzadeh 
 wrote:  
 
 Please also note that Flask, by default, is a single-threaded web framework. 
While it is suitable for development and small-scale applications, it may not 
handle concurrent requests efficiently in a production environment.In 
production, one can utilise Gunicorn (Green Unicorn) which is a WSGI ( Web 
Server Gateway Interface) that is commonly used to serve Flask applications in 
production. It provides multiple worker processes, each capable of handling a 
single request at a time. This makes Gunicorn suitable for handling multiple 
simultaneous requests and improves the concurrency and performance of your 
Flask application.

HTH
Mich Talebzadeh,Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom



   view my Linkedin profile




 https://en.everybodywiki.com/Mich_Talebzadeh

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destructionof data or any other property which may arise from relying 
on this email's technical content is explicitly disclaimed.The author will in 
no case be liable for any monetary damages arising from suchloss, damage or 
destruction. 

 


On Mon, 8 Jan 2024 at 19:30, Mich Talebzadeh  wrote:

Thought it might be useful to share my idea with fellow forum members.  During 
the breaks, I worked on the seamless integration of Spark Structured Streaming 
with Flask REST API for real-time data ingestion and analytics. The use case 
revolves around a scenario where data is generated through REST API requests in 
real time. The Flask REST API efficiently captures and processes this data, 
saving it to a Spark Structured Streaming DataFrame. Subsequently, the 
processed data could be channelled into any sink of your choice including Kafka 
pipeline, showing a robust end-to-end solution for dynamic and responsive data 
streaming. I will delve into the architecture, implementation, and benefits of 
this combination, enabling one to build an agile and efficient real-time data 
application. I will put the code in GitHub for everyone's benefit. Hopefully 
your comments will help me to improve it.
Cheers
Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom



   view my Linkedin profile




 https://en.everybodywiki.com/Mich_Talebzadeh

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destructionof data or any other property which may arise from relying 
on this email's technical content is explicitly disclaimed.The author will in 
no case be liable for any monetary damages arising from suchloss, damage or 
destruction.

Unsubscribe

2024-01-09 Thread mahzad kalantari

Unsubscribe

Unsubscribe

2024-01-09 Thread Kalhara Gurugamage

Unsubscribe

Unsubscribe

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

Unsubscribe

Unsubscribe

5 matches

Site Navigation

Mail list logo

Footer information