[ 
https://issues.apache.org/jira/browse/SPARK-41665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Prelle updated SPARK-41665:
----------------------------------
    Description: 
Hi,
We detect a strange behavior on spark streaming when we set a trigger interval 
for example at 1 minutes all query will start at 0:00:00 0:01:00 0:02:00 no 
matter the start time of the query.
So all query are "sync", so it's can disturbed a cluster a cluster i do leads 
to spike of utilisation 
!image-2022-12-21-07-57-18-679.png!

For me the expected behavior should be like this

!image-2022-12-21-07-57-32-654.png!

 

It's because of this line 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L98]

as now in intervalMS are long now / intervalMs * intervalMs will just cut in my 
case the second, as it's explicitely like this on the test 
([https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeExecutorSuite.scala#L36)]
  i do not know if it's the expected behavior or it's juste because this line 
it's here since 6 years. So it's affecting all versions since 6 years. 

Regards

Thomas

 

  was:
Hi,
We detect a strange behavior on spark streaming when we set a trigger interval 
for example at 1 minutes all query will start at 0:00:00 0:01:00 0:02:00 no 
matter the start time of the query.
So all query are "sync", so it's can disturbed a cluster a cluster i do leads 
to spike of utilisation 
!image-2022-12-21-07-57-18-679.png!

For me the expected behavior should be like this

!image-2022-12-21-07-57-32-654.png!

 

It's because of this line 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L98]

as now in intervalMS are long now / intervalMs * intervalMs will just cut in my 
case the second, as it's explicitely like this i do not know if it's the 
expected behavior or it's juste because this line it's here since 6 years. So 
it's affecting all versions since 6 years. 

Regards

Thomas

 


> Spark streaming query scheduling synchronisation  with Trigger Interval
> -----------------------------------------------------------------------
>
>                 Key: SPARK-41665
>                 URL: https://issues.apache.org/jira/browse/SPARK-41665
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.8, 3.0.3, 3.1.2, 3.2.2, 3.3.1
>            Reporter: Thomas Prelle
>            Priority: Major
>         Attachments: image-2022-12-21-07-57-18-679.png, 
> image-2022-12-21-07-57-32-654.png
>
>
> Hi,
> We detect a strange behavior on spark streaming when we set a trigger 
> interval for example at 1 minutes all query will start at 0:00:00 0:01:00 
> 0:02:00 no matter the start time of the query.
> So all query are "sync", so it's can disturbed a cluster a cluster i do leads 
> to spike of utilisation 
> !image-2022-12-21-07-57-18-679.png!
> For me the expected behavior should be like this
> !image-2022-12-21-07-57-32-654.png!
>  
> It's because of this line 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L98]
> as now in intervalMS are long now / intervalMs * intervalMs will just cut in 
> my case the second, as it's explicitely like this on the test 
> ([https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeExecutorSuite.scala#L36)]
>   i do not know if it's the expected behavior or it's juste because this line 
> it's here since 6 years. So it's affecting all versions since 6 years. 
> Regards
> Thomas
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to