[ 
https://issues.apache.org/jira/browse/SOLR-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Salagnac updated SOLR-16843:
-----------------------------------
    Description: 
The framework does not exist anymore with Solr 9+, so this bug only impacts 
Solr 8.x.
 
Triggers in autoscaling framework use timestamps returned by the JVM call 
{{{}System.nanoTime(){}}}, but according to the Javadoc, this is NOT an 
absolute timestamp. This is just a number relative to a random origin, and this 
origin will change each time the JVM is restarted.
 
I figured out this impacts at least the following triggers (with basically the 
same pattern), 
- {{IndexSizeTrigger}}
- {{MetricTrigger}}
- {{SearchRateTrigger}}
 
These triggers want to fire an event when a certain condition (depending on 
each trigger) is met for a certain period of time. They maintain a map with 
[what, timestamp] entries to track a short term history, with the option to 
remove an entry if the condition is not met anymore, so we don't trigger any 
event.
Timestamps come from {{{}System.nanoTime(){}}}. So far so good as long as we 
compare these timestamps to each others in the same JVM. Now, this map is 
persisted in Zookeeper in case of an overseer change (written and read by 
{{TriggerBase.saveState()}} and {{restoreState()}}). With an overseer change, 
the {{nanoTime()}} origin is randomly moved to something else. Consequently, 
all the persisted timestamps from the previous overseer cannot be compared with 
the current JVM "clock".
This ends in triggers never being fired, or being fired without waiting for the 
time configured.
 
 
Simple fix could be to always use {{TimeSource.getEpochTimeNs()}} instead of 
{{getTimeNs()}} in autoscaling code.

  was:
The framework does not exist anymore with Solr 9+, so this bug only impacts 
Solr 8.x.
 
Triggers in autoscaling framework use timestamps returned by the JVM call 
{{{}System.nanoTime(){}}}, but according to the Javadoc, this is NOT an 
absolute timestamp. This is just a number relative to a random origin, and this 
origin will change each time the JVM is restarted.
 
I figured out this impacts at least the following triggers (with basically the 
same pattern), 
- {{IndexSizeTrigger}}
- {{MetricTrigger}}
- {{SearchRateTrigger}}
 
These triggers want to fire an event when a certain condition (depending on 
each trigger) is met for a certain period of time. They maintain a map with 
[what, timestamp] entries to track a short term history, with the option to 
remove an entry if the condition is not met anymore, so we don't trigger any 
event.
Timestamps come from {{{}System.nanoTime(){}}}. So far so good as long as we 
compare these timestamps to each others in the same JVM. Now, this map is 
persisted in Zookeeper in case of an overseer change (written and read by 
{{TriggerBase.saveState()}} and {{restoreState() }}). With an overseer change, 
the {{nanoTime()}} origin is randomly moved to something else. Consequently, 
all the persisted timestamps from the previous overseer cannot be compared with 
the current JVM "clock".
This ends in triggers never being fired, or being fired without waiting for the 
time configured.
 
 
Simple fix could be to always use {{TimeSource.getEpochTimeNs()}} instead of 
{{getTimeNs()}} in autoscaling code.


> timestamp issue with autoscaling framework
> ------------------------------------------
>
>                 Key: SOLR-16843
>                 URL: https://issues.apache.org/jira/browse/SOLR-16843
>             Project: Solr
>          Issue Type: Bug
>          Components: AutoScaling
>    Affects Versions: 8.11
>            Reporter: Pierre Salagnac
>            Priority: Minor
>              Labels: autoscaling
>
> The framework does not exist anymore with Solr 9+, so this bug only impacts 
> Solr 8.x.
>  
> Triggers in autoscaling framework use timestamps returned by the JVM call 
> {{{}System.nanoTime(){}}}, but according to the Javadoc, this is NOT an 
> absolute timestamp. This is just a number relative to a random origin, and 
> this origin will change each time the JVM is restarted.
>  
> I figured out this impacts at least the following triggers (with basically 
> the same pattern), 
> - {{IndexSizeTrigger}}
> - {{MetricTrigger}}
> - {{SearchRateTrigger}}
>  
> These triggers want to fire an event when a certain condition (depending on 
> each trigger) is met for a certain period of time. They maintain a map with 
> [what, timestamp] entries to track a short term history, with the option to 
> remove an entry if the condition is not met anymore, so we don't trigger any 
> event.
> Timestamps come from {{{}System.nanoTime(){}}}. So far so good as long as we 
> compare these timestamps to each others in the same JVM. Now, this map is 
> persisted in Zookeeper in case of an overseer change (written and read by 
> {{TriggerBase.saveState()}} and {{restoreState()}}). With an overseer change, 
> the {{nanoTime()}} origin is randomly moved to something else. Consequently, 
> all the persisted timestamps from the previous overseer cannot be compared 
> with the current JVM "clock".
> This ends in triggers never being fired, or being fired without waiting for 
> the time configured.
>  
>  
> Simple fix could be to always use {{TimeSource.getEpochTimeNs()}} instead of 
> {{getTimeNs()}} in autoscaling code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to