Re: Online classes for spark topics

2023-03-12 Thread vaquar khan
I saw you are looking holden video .please find following link.

https://www.oreilly.com/library/view/debugging-apache-spark/9781492039174/

Regards,
Vaquar khan


On Sun, Mar 12, 2023, 6:56 PM Mich Talebzadeh 
wrote:

> Hi Denny,
>
> Thanks for the offer. How do you envisage that structure to be?
>
>
> Also it would be good to have a webinar (for a given topic)  for different
> target audiences as we have a mixture of members in Spark forums. For
> example, beginners, intermediate and advanced.
>
>
> do we have a confluence page for Spark so we can use it. I guess that
> would be part of the structure you mentioned.
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 12 Mar 2023 at 22:59, Denny Lee  wrote:
>
>> Looks like we have some good topics here - I'm glad to help with setting
>> up the infrastructure to broadcast if it helps?
>>
>> On Thu, Mar 9, 2023 at 6:19 AM neeraj bhadani <
>> bhadani.neeraj...@gmail.com> wrote:
>>
>>> I am happy to be a part of this discussion as well.
>>>
>>> Regards,
>>> Neeraj
>>>
>>> On Wed, 8 Mar 2023 at 22:41, Winston Lai  wrote:
>>>
 +1, any webinar on Spark related topic is appreciated 

 Thank You & Best Regards
 Winston Lai
 --
 *From:* asma zgolli 
 *Sent:* Thursday, March 9, 2023 5:43:06 AM
 *To:* karan alang 
 *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com
 ; User 
 *Subject:* Re: Online classes for spark topics

 +1

 Le mer. 8 mars 2023 à 21:32, karan alang  a
 écrit :

 +1 .. I'm happy to be part of these discussions as well !




 On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

 Hi,

 I guess I can schedule this work over a course of time. I for myself
 can contribute plus learn from others.

 So +1 for me.

 Let us see if anyone else is interested.

 HTH



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Wed, 8 Mar 2023 at 17:48, ashok34...@yahoo.com 
 wrote:


 Hello Mich.

 Greetings. Would you be able to arrange for Spark Structured Streaming
 learning webinar.?

 This is something I haven been struggling with recently. it will be
 very helpful.

 Thanks and Regard

 AK
 On Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:


 Hi,

 This might  be a worthwhile exercise on the assumption that the
 contributors will find the time and bandwidth to chip in so to speak.

 I am sure there are many but on top of my head I can think of Holden
 Karau for k8s, and Sean Owen for data science stuff. They are both very
 experienced.

 Anyone else 樂

 HTH



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
  wrote:

 Hello gurus,

 Does Spark arranges online webinars for special topics like Spark on
 K8s, data science and Spark Structured Streaming?

 I would be most grateful if experts can share their experience with
 learners with intermediate knowledge like myself. Hopefully we will find
 the practical experiences told valuable.

 Respectively,

 AK




>>>


Re: Online classes for spark topics

2023-03-12 Thread Mich Talebzadeh
Hi Denny,

Thanks for the offer. How do you envisage that structure to be?


Also it would be good to have a webinar (for a given topic)  for different
target audiences as we have a mixture of members in Spark forums. For
example, beginners, intermediate and advanced.


do we have a confluence page for Spark so we can use it. I guess that would
be part of the structure you mentioned.


HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 12 Mar 2023 at 22:59, Denny Lee  wrote:

> Looks like we have some good topics here - I'm glad to help with setting
> up the infrastructure to broadcast if it helps?
>
> On Thu, Mar 9, 2023 at 6:19 AM neeraj bhadani 
> wrote:
>
>> I am happy to be a part of this discussion as well.
>>
>> Regards,
>> Neeraj
>>
>> On Wed, 8 Mar 2023 at 22:41, Winston Lai  wrote:
>>
>>> +1, any webinar on Spark related topic is appreciated 
>>>
>>> Thank You & Best Regards
>>> Winston Lai
>>> --
>>> *From:* asma zgolli 
>>> *Sent:* Thursday, March 9, 2023 5:43:06 AM
>>> *To:* karan alang 
>>> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com
>>> ; User 
>>> *Subject:* Re: Online classes for spark topics
>>>
>>> +1
>>>
>>> Le mer. 8 mars 2023 à 21:32, karan alang  a
>>> écrit :
>>>
>>> +1 .. I'm happy to be part of these discussions as well !
>>>
>>>
>>>
>>>
>>> On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I guess I can schedule this work over a course of time. I for myself can
>>> contribute plus learn from others.
>>>
>>> So +1 for me.
>>>
>>> Let us see if anyone else is interested.
>>>
>>> HTH
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Wed, 8 Mar 2023 at 17:48, ashok34...@yahoo.com 
>>> wrote:
>>>
>>>
>>> Hello Mich.
>>>
>>> Greetings. Would you be able to arrange for Spark Structured Streaming
>>> learning webinar.?
>>>
>>> This is something I haven been struggling with recently. it will be very
>>> helpful.
>>>
>>> Thanks and Regard
>>>
>>> AK
>>> On Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>
>>> Hi,
>>>
>>> This might  be a worthwhile exercise on the assumption that the
>>> contributors will find the time and bandwidth to chip in so to speak.
>>>
>>> I am sure there are many but on top of my head I can think of Holden
>>> Karau for k8s, and Sean Owen for data science stuff. They are both very
>>> experienced.
>>>
>>> Anyone else 樂
>>>
>>> HTH
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
>>>  wrote:
>>>
>>> Hello gurus,
>>>
>>> Does Spark arranges online webinars for special topics like Spark on
>>> K8s, data science and Spark Structured Streaming?
>>>
>>> I would be most grateful if experts can share their experience with
>>> learners with intermediate knowledge like myself. Hopefully we will find
>>> the practical experiences told valuable.
>>>
>>> Respectively,
>>>
>>> AK
>>>
>>>
>>>
>>>
>>


Re: Online classes for spark topics

2023-03-12 Thread Denny Lee
Looks like we have some good topics here - I'm glad to help with setting up
the infrastructure to broadcast if it helps?

On Thu, Mar 9, 2023 at 6:19 AM neeraj bhadani 
wrote:

> I am happy to be a part of this discussion as well.
>
> Regards,
> Neeraj
>
> On Wed, 8 Mar 2023 at 22:41, Winston Lai  wrote:
>
>> +1, any webinar on Spark related topic is appreciated 
>>
>> Thank You & Best Regards
>> Winston Lai
>> --
>> *From:* asma zgolli 
>> *Sent:* Thursday, March 9, 2023 5:43:06 AM
>> *To:* karan alang 
>> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com <
>> ashok34...@yahoo.com>; User 
>> *Subject:* Re: Online classes for spark topics
>>
>> +1
>>
>> Le mer. 8 mars 2023 à 21:32, karan alang  a
>> écrit :
>>
>> +1 .. I'm happy to be part of these discussions as well !
>>
>>
>>
>>
>> On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> Hi,
>>
>> I guess I can schedule this work over a course of time. I for myself can
>> contribute plus learn from others.
>>
>> So +1 for me.
>>
>> Let us see if anyone else is interested.
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 8 Mar 2023 at 17:48, ashok34...@yahoo.com 
>> wrote:
>>
>>
>> Hello Mich.
>>
>> Greetings. Would you be able to arrange for Spark Structured Streaming
>> learning webinar.?
>>
>> This is something I haven been struggling with recently. it will be very
>> helpful.
>>
>> Thanks and Regard
>>
>> AK
>> On Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> This might  be a worthwhile exercise on the assumption that the
>> contributors will find the time and bandwidth to chip in so to speak.
>>
>> I am sure there are many but on top of my head I can think of Holden
>> Karau for k8s, and Sean Owen for data science stuff. They are both very
>> experienced.
>>
>> Anyone else 樂
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
>>  wrote:
>>
>> Hello gurus,
>>
>> Does Spark arranges online webinars for special topics like Spark on K8s,
>> data science and Spark Structured Streaming?
>>
>> I would be most grateful if experts can share their experience with
>> learners with intermediate knowledge like myself. Hopefully we will find
>> the practical experiences told valuable.
>>
>> Respectively,
>>
>> AK
>>
>>
>>
>>
>


Re: Spark StructuredStreaming - watermark not working as expected

2023-03-12 Thread Mich Talebzadeh
OK

ts is the timestamp right?

This is a similar code that works out the average temperature with time
frame of 5 minutes.

Note the comments and catch error with try:

try:

# construct a streaming dataframe streamingDataFrame that
subscribes to topic temperature
streamingDataFrame = self.spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers",
config['MDVariables']['bootstrapServers'],) \
.option("schema.registry.url",
config['MDVariables']['schemaRegistryURL']) \
.option("group.id", config['common']['appName']) \
.option("zookeeper.connection.timeout.ms",
config['MDVariables']['zookeeperConnectionTimeoutMs']) \
.option("rebalance.backoff.ms",
config['MDVariables']['rebalanceBackoffMS']) \
.option("zookeeper.session.timeout.ms",
config['MDVariables']['zookeeperSessionTimeOutMs']) \
.option("auto.commit.interval.ms",
config['MDVariables']['autoCommitIntervalMS']) \
.option("subscribe", "temperature") \
.option("failOnDataLoss", "false") \
.option("includeHeaders", "true") \
.option("startingOffsets", "latest") \
.load() \
.select(from_json(col("value").cast("string"),
schema).alias("parsed_value"))


resultC = streamingDataFrame.select( \
 col("parsed_value.rowkey").alias("rowkey") \
   , col("parsed_value.timestamp").alias("timestamp") \
   , col("parsed_value.temperature").alias("temperature"))

"""
We work out the window and the AVG(temperature) in the window's
timeframe below
This should return back the following Dataframe as struct

 root
 |-- window: struct (nullable = false)
 ||-- start: timestamp (nullable = true)
 ||-- end: timestamp (nullable = true)
 |-- avg(temperature): double (nullable = true)

"""
resultM = resultC. \
 withWatermark("timestamp", "5 minutes"). \
 groupBy(window(resultC.timestamp, "5 minutes", "5
minutes")). \
 avg('temperature')

# We take the above Dataframe and flatten it to get the columns
aliased as "startOfWindowFrame", "endOfWindowFrame" and "AVGTemperature"
resultMF = resultM. \
   select( \

F.col("window.start").alias("startOfWindowFrame") \
  , F.col("window.end").alias("endOfWindowFrame") \
  ,
F.col("avg(temperature)").alias("AVGTemperature"))

resultMF.printSchema()

result = resultMF. \
 writeStream. \
 outputMode('complete'). \
 option("numRows", 1000). \
 option("truncate", "false"). \
 format('console'). \
 option('checkpointLocation', checkpoint_path). \
 queryName("temperature"). \
 start()

except Exception as e:
print(f"""{e}, quitting""")
sys.exit(1)



HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 11 Mar 2023 at 04:33, karan alang  wrote:

> Hi Mich -
> Here is the output of the ldf.printSchema() & ldf.show() commands.
>
> ldf.printSchema()
>
> root
>  |-- applianceName: string (nullable = true)
>  |-- timeslot: long (nullable = true)
>  |-- customer: string (nullable = true)
>  |-- window: struct (nullable = false)
>  ||-- start: timestamp (nullable = true)
>  ||-- end: timestamp (nullable = true)
>  |-- sentOctets: long (nullable = true)
>  |-- recvdOctets: long (nullable = true)
>
>
>  ldf.show() :
>
>
>  
> +--+---++--++--+--+---+
> |applianceName |timeslot|customer|window
>|sentOctets|recvdOctets|
>
> +--+---++--++--+--+---+
> |abc1  |2797514|cust1 |{2023-03-11 04:15:00,
> 2023-03-11 04:30:00}|21459264  |32211859   |
> |pqrq  |2797513|cust1 |{2023-03-11 04:15:00,
> 2023-03-11 04:30:00}|17775527  |31331093   |
> |xyz|2797514|cust1 |{2023-03-11 04:15:00,
> 2023-03-11