Re: [atlas] One-off measurements not terminating

2019-12-30 Thread Steve Gibbard
Thanks Chris!  Atlas now looks like it’s behaving the way it did before 
December 24 — stopping one-off measurements about five to ten minutes after 
they start — which suits my purposes nicely.

As far as manual ‘deletes' go, it doesn’t look like my efforts to ‘delete' 
measurements as soon as I’ve been able to pick up results are working.  There’s 
no need to fix this on my account — now that measurements are stopped 
automatically again, I’ll probably delete the attempted work-around from my 
code — but here are details in case they’re otherwise useful:

The process www.globaltraceroute.com follows is this:

- Create a measurement.  Get a measurement ID.
- Immediately begin asking for a result every five seconds until it gets one.
- Display the result to the user.
- New, as of yesterday, send a ‘delete’ in an attempt to stop the measurement.
- Exit

Yesterday, the measurements my code attempted to stop this way kept on running 
indefinitely, just as if it hadn’t sent a ‘delete’ request.  However, if I 
waited five or ten minutes and ran the function that sends the delete, the 
measurement would stop.  It was a little hard to tell that it was working 
because measurements would take several minutes  to show up as stopped, but 
when they did the timestamp for the end of the measurement would match the time 
I ran the ‘delete’ function.

Today, it’a again a little hard to tell what’s doing what.  The measurements 
are all showing as stopped eventually.  But if they had been stopped by the 
delete my script sent, based on what I saw yesterday I assume the stop 
timestamp would be within a minute or two after the start timestamp.  Instead, 
the delete timestamp is five to ten minutes after the start timestamp, 
suggesting that they’re continuing to run until Atlas with your latest fix 
decides they’re finished.

Measurement 23732704 is a random example of this — a measurement that was sent 
a ‘delete’ 50 seconds after creation (Dec 30 21:25:14 UTC), but didn’t stop 
until roughly five minutes later - 2019-12-30 21:30 per 
https://atlas.ripe.net/measurements/ .

Alternatively, if you have a real time view into the Atlas API, you could go to 
www.globaltraceroute.com and create a measurement.  It should then show up 
immediately in Atlas under the s...@gibbard.org username and go through the 
process outlined above.

Thanks,
Steve

> On Dec 30, 2019, at 2:24 AM, Chris Amin  wrote:
> 
> Hi Steve,
> 
> There was indeed a problem where measurements were not being
> automatically updated with a "stopped" status. This should now be fixed,
> but please let me know if you notice any lingering issues.
> 
> Can you confirm that the issue with manually DELETEing not having an
> effect still persists? If so, can you give me an example measurement ID?
> 
> Thanks,
> Chris Amin
> RIPE NCC
> 
> On 30/12/2019 06:53, Steve Gibbard wrote:
>> An update:   I was able to ‘delete' my stuck measurements via the API, so 
>> they’re stopped now and I’m back up and running for the moment.
>> 
>> I also added an API command to my code to ‘delete’ measurements as soon as 
>> the results have been picked up, which I hoped would make this fix 
>> sustainable, but so far that doesn’t seem to be doing anything.  Perhaps a 
>> longer delay is required between creating the measurement and sending the 
>> ‘delete’ command?
>> 
>> Thanks,
>> Steve
>> 
>>> On Dec 28, 2019, at 3:20 PM, Steve Gibbard  wrote:
>>> 
>>> Hi Atlas folks,
>>> 
>>> I hope you’re having a good holiday season.  Sorry to interrupt it by 
>>> complaining about issues.
>>> 
>>> On Christmas Eve my time (early Christmas morning your time) there was an 
>>> Atlas issue where any attempt at reading measurements failed with an HTTP 
>>> 500 status error.  That appears to have gotten fixed on Christmas (a really 
>>> big thank you to whoever worked on that) but since then it appears that 
>>> while most of the one-off measurements we’ve created have delivered results 
>>> very quickly, none of the measurements created since 17:00 UTC on 
>>> 2019-12-25 have stopped running.  As shown in the Atlas portal:
>>> 
>>> 
>>> 23722197Traceroute  www.globaltraceroute.com (AS13335)  Test 
>>> Traceroute 1   one-off 2019-12-25 22:24
>>> Never   
>>> 23722089Traceroute  archive.ubuntu.com (AS41231)Test Traceroute 
>>> 1   one-off 2019-12-25 19:16
>>> Never   
>>> 23722088Traceroute  sps.prima.com.ar (AS10318)  Test Traceroute 
>>> 1   one-off 2019-12-25 19:14
>>> Never   
>>> 23721915Traceroute  www.globaltraceroute.com (AS13335)  Test 
>>> Traceroute 1   one-off 2019-12-25 17:00
>>> Never
>>> 
>>> And on for every measurement between then and now.
>>> 
>>> Previously, the typical one-off measurement was listed with start and stop 
>>> times less than 10 minutes apart.
>>> 
>>> When a user has 100 measurements running concurrently, creation of new 
>>> measurements fails, which is happening for

Re: [atlas] One-off measurements not terminating

2019-12-30 Thread Chris Amin
Hi Steve,

There was indeed a problem where measurements were not being
automatically updated with a "stopped" status. This should now be fixed,
but please let me know if you notice any lingering issues.

Can you confirm that the issue with manually DELETEing not having an
effect still persists? If so, can you give me an example measurement ID?

Thanks,
Chris Amin
RIPE NCC

On 30/12/2019 06:53, Steve Gibbard wrote:
> An update:   I was able to ‘delete' my stuck measurements via the API, so 
> they’re stopped now and I’m back up and running for the moment.
> 
> I also added an API command to my code to ‘delete’ measurements as soon as 
> the results have been picked up, which I hoped would make this fix 
> sustainable, but so far that doesn’t seem to be doing anything.  Perhaps a 
> longer delay is required between creating the measurement and sending the 
> ‘delete’ command?
> 
> Thanks,
> Steve
> 
>> On Dec 28, 2019, at 3:20 PM, Steve Gibbard  wrote:
>>
>> Hi Atlas folks,
>>
>> I hope you’re having a good holiday season.  Sorry to interrupt it by 
>> complaining about issues.
>>
>> On Christmas Eve my time (early Christmas morning your time) there was an 
>> Atlas issue where any attempt at reading measurements failed with an HTTP 
>> 500 status error.  That appears to have gotten fixed on Christmas (a really 
>> big thank you to whoever worked on that) but since then it appears that 
>> while most of the one-off measurements we’ve created have delivered results 
>> very quickly, none of the measurements created since 17:00 UTC on 2019-12-25 
>> have stopped running.  As shown in the Atlas portal:
>>
>>
>> 23722197 Traceroute  www.globaltraceroute.com (AS13335)  Test 
>> Traceroute 1   one-off 2019-12-25 22:24
>> Never
>> 23722089 Traceroute  archive.ubuntu.com (AS41231)Test Traceroute 
>> 1   one-off 2019-12-25 19:16
>> Never
>> 23722088 Traceroute  sps.prima.com.ar (AS10318)  Test Traceroute 
>> 1   one-off 2019-12-25 19:14
>> Never
>> 23721915 Traceroute  www.globaltraceroute.com (AS13335)  Test 
>> Traceroute 1   one-off 2019-12-25 17:00
>> Never
>>
>> And on for every measurement between then and now.
>>
>> Previously, the typical one-off measurement was listed with start and stop 
>> times less than 10 minutes apart.
>>
>> When a user has 100 measurements running concurrently, creation of new 
>> measurements fails, which is happening for me now.
>>
>> If somebody could take a look at this, I’d really appreciate it.
>>
>> Thanks,
>> Steve
>>
> 
> 
> 
> 
> 
> 
>