Re: [openstack-dev] [swift] - question about statsd messages and 404 errors

Samuel Merritt Fri, 25 Jul 2014 15:47:41 -0700

On 7/25/14, 4:58 AM, Seger, Mark (Cloud Services) wrote:

I’m trying to track object server GET errors using statsd and I’m not
seeing them.  The test I’m doing is to simply do a GET on an
non-existent object.  As expected, a 404 is returned and the object
server log records it.  However, statsd implies it succeeded because
there were no errors reported.  A read of the admin guide does clearly
say the GET timing includes failed GETs, but my question then becomes
how is one to tell there was a failure?  Should there be another type of
message that DOES report errors?  Or how about including these in the
‘object-server.GET.errors.timing’ message?

What "error" means with respect to Swift's backend-server timing metricsis pretty fuzzy at the moment, and could probably use some work.

The idea is that object-server.GET.timing has timing data for everythingthat Swift handled successfully, and object-server.GET.timing.errors hastiming data for things where Swift failed.

Some things are pretty easy to divide up. For example, 200-series statuscode always counts as success, and 500-series status code always countsas error.

It gets tricky in the 400-series status codes. For example, a 404 meansthat a client asked for an object that doesn't exist. That's not Swift'sfault, so that goes into the success bucket (object-server.GET.timing).Similarly, a 412 means that a client set an unsatisfiable preconditionin the If-Match, If-None-Match, If-Modified-Since, orIf-Unmodified-Since headers, and Swift correctly determined that therequested object can't fulfill the precondition, so that one goes in thesuccess bucket too.

However, there are other status codes that are more ambiguous. Consider409; the object server responds with 409 if the request's X-Timestamp isless than the object's X-Timestamp (on PUT/POST/DELETE). You can getthis with two near-simultaneous POSTs:


  1. request A hits proxy; proxy assigns X-Timestamp: 1406316223.851131
  2. request B hits proxy; proxy assigns X-Timestamp: 1406316223.851132
  3. request B hits object server and gets 202
  4. request A hits object server and gets 409

Does that error count as Swift's fault? If the client requests werenearly simultaneous, then I think not; there's always going to be *some*delay between accept() and gettimeofday(). On the other hand, if oneproxy server's time is significantly behind another's, then it isSwift's fault.

It's even worse with 400; sometimes it's for bad paths (like asking anobject server for /<partition>/<account>/<container>; this can happen ifthe administrator misconfigures their rings), and sometimes it's for badX-Delete-At / X-Delete-After values (which are set by the client).

I'm not sure what the best way to fix this is, but if you just want tosee some error metrics, unmount a disk to get some 507s.


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [swift] - question about statsd messages and 404 errors

Reply via email to