> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from 
> random users.

TBH I don't see the reason why a validated request-id value can't be
logged on a callee service side, probably because I missed some previous
context. Could you please give an example of such concerns?

With service user I see two blocks:
- A callee service needs to know if it's "special" user or not.
- Until all services don't use a service user we'll not get the complete trace.

Sean Dague writes:

> One of the things that came up in a logging Forum session is how much 
> effort operators are having to put into reconstructing flows for things 
> like server boot when they go wrong, as every time we jump a service 
> barrier the request-id is reset to something new. The back and forth 
> between Nova / Neutron and Nova / Glance would be definitely well served 
> by this. Especially if this is something that's easy to query in elastic 
> search.
>
> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from 
> random users. We're going to assume that's still a concern by some. 
> However, since the last time that came up, we've introduced the concept 
> of "service users", which are a set of higher priv services that we are 
> using to wrap user requests between services so that long running 
> request chains (like image snapshot). We trust these service users 
> enough to keep on trucking even after the user token has expired for 
> this long run operations. We could use this same trust path for 
> request-id chaining.
>
> So, the basic idea is, services will optionally take an inbound 
> X-OpenStack-Request-ID which will be strongly validated to the format 
> (req-$uuid). They will continue to always generate one as well. When the 
> context is built (which is typically about 3 more steps down the paste 
> pipeline), we'll check that the service user was involved, and if not, 
> reset the request_id to the local generated one. We'll log both the 
> global and local request ids. All of these changes happen in 
> oslo.middleware, oslo.context, oslo.log, and most projects won't need 
> anything to get this infrastructure.
>
> The python clients, and callers, will then need to be augmented to pass 
> the request-id in on requests. Servers will effectively decide when they 
> want to opt into calling other services this way.
>
> This only ends up logging the top line global request id as well as the 
> last leaf for each call. This does mean that full tree construction will 
> take more work if you are bouncing through 3 or more servers, but it's a 
> step which I think can be completed this cycle.
>
> I've got some more detailed notes, but before going through the process 
> of putting this into an oslo spec I wanted more general feedback on it 
> so that any objections we didn't think about yet can be raised before 
> going through the detailed design.
>
>       -Sean

-- 
Thanks,

Andrey Volkov,
Software Engineer, Mirantis, Inc.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to