Re: [atlas] RIPE Atlas probe status issues

2018-04-25 Thread Daniel AJ Sokolov
Thank you. Has the problem resurfaced?

My probe #1118 is showing offline again, although it isn't. I can see
the ongoing measurement results on my profile page.

BR
Daniel AJ

On 2018-04-23 at 06:37 AM, Chris Amin wrote:
> Dear RIPE Atlas users,
> 
> There were various issues relating to the recorded status of RIPE Atlas
> probes over the weekend. This was brought to our attention by internal
> monitoring and information provided by users on the mailing list.
> 
> Throughout this period most probes did actually remain connected to
> controllers, and measurement results were collected as normal. The side
> effects included:
> 
> * the number of probes reported as connected by the system was lower
> than it should have been
> * the status (connected/disconnected) of many probes was incorrect
> * new measurements took longer than usual to start
> * fewer probes than usual were available for new measurements, leading
> in some cases to “no suitable” probes messages when trying to schedule
> new measurements
> * various system tags were incorrectly applied, including many probes
> being marked as having USB problems when this was not the case
> * temporary discrepancies with crediting/debiting of RIPE Atlas credits
> for the
> connected time of probes
> 
> The issues were caused by a bug fix deployment at Friday 9AM UTC where a
> package was accidentally downgraded causing a regression to an old bug
> in the task handling of the central system. This bug caused a backlog of
> messages to build, slowing down or stopping the registering of various
> status messages in the system. Problems built up gradually as the
> backlog increased, until the root cause was identified on Sunday
> morning. The issue was then fixed and the system stabilized completely
> by about 10AM UTC. We have identified procedural and technical solutions
> that will stop this problem happening again, and are looking at ways to
> improve our monitoring of these kinds of issues.
> 
> We apologise for any inconvenience or confusion caused by this event and
> would like to thank all of you who took the time to notify us of what
> you were seeing.
> 
> Kind regards,
> Chris Amin
> RIPE NCC
> 




[atlas] RIPE Atlas probe status issues

2018-04-23 Thread Chris Amin
Dear RIPE Atlas users,

There were various issues relating to the recorded status of RIPE Atlas
probes over the weekend. This was brought to our attention by internal
monitoring and information provided by users on the mailing list.

Throughout this period most probes did actually remain connected to
controllers, and measurement results were collected as normal. The side
effects included:

* the number of probes reported as connected by the system was lower
than it should have been
* the status (connected/disconnected) of many probes was incorrect
* new measurements took longer than usual to start
* fewer probes than usual were available for new measurements, leading
in some cases to “no suitable” probes messages when trying to schedule
new measurements
* various system tags were incorrectly applied, including many probes
being marked as having USB problems when this was not the case
* temporary discrepancies with crediting/debiting of RIPE Atlas credits
for the
connected time of probes

The issues were caused by a bug fix deployment at Friday 9AM UTC where a
package was accidentally downgraded causing a regression to an old bug
in the task handling of the central system. This bug caused a backlog of
messages to build, slowing down or stopping the registering of various
status messages in the system. Problems built up gradually as the
backlog increased, until the root cause was identified on Sunday
morning. The issue was then fixed and the system stabilized completely
by about 10AM UTC. We have identified procedural and technical solutions
that will stop this problem happening again, and are looking at ways to
improve our monitoring of these kinds of issues.

We apologise for any inconvenience or confusion caused by this event and
would like to thank all of you who took the time to notify us of what
you were seeing.

Kind regards,
Chris Amin
RIPE NCC



signature.asc
Description: OpenPGP digital signature