Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-17 Thread Sean Dague
On 05/16/2017 01:28 PM, John Dickinson wrote:

> I'm not sure the best place to respond (mailing list or gerrit), so
> I'll write this up and post it to both places.
> 
> I think the idea behind this proposal is great. It has the potential
> to bring a lot of benefit to users who are tracing a request across
> many different services, in part by making it easy to search in an
> indexing system like ELK.
> 
> The current proposal has some elements that won't work with the way
> Swift currently solves this problem. This is mostly due to the
> proposed uuid-ish check for validation. However, the Swift solution
> has a few aspects that I believe would be very helpful for the entire
> community.
> 
> NB: Swift returns both an |X-OpenStack-Request-ID| and an |X-Trans-ID|
> header in every response. The |X-Trans-ID| was implemented before the
> OpenStack request ID was proposed, and so we've kept the |X-Trans-ID| so
> as not to break existing clients. The value of |X-OpenStack-Request-ID|
> in any response from Swift is simply a mirror of the |X-Trans-ID| value.
> 
> The request id in Swift is made up of a few parts:
> 
> |X-Openstack-Request-Id: txbea0071df2b0465082501-00591b3077saio-extraextra |
> 
> In the code, this in generated from:
> 
> |'tx%s-%010x%s' % (uuid.uuid4().hex[:21], time.time(),
> quote(trans_id_suffix)) |
> 
> ...meaning that there are three parts to the request id. Let's take
> each in turn.
> 
> The first part always starts with 'tx' (originally from the
> "transaction id") and then is the first 21 hex characters of a uuid4.
> The truncation is to limit the overall length of the value.
> 
> The second part is the hex value of the current time, padded to 10
> characters.
> 
> Finally, the third part is the quoted suffix, and it defaults to the
> empty string. The suffix itself can be made of two parts. The first is
> configured in the Swift proxy server itself (ie the service that does
> the logging) via the |trans_id_suffix| config. This allows an operator
> to set a different suffix for each API endpoint or each region or each
> cluster in order to help distinguish them in logs. For example, if a
> deployment with multiple clusters uses centralized log aggregation, a
> different trans_id_suffix value for each cluster makes it very easy to
> distinguish between the clusters' logs.
> 
> The last part of the suffix is settable via the end-user (ie the one
> calling the API). When the "X-Trans-ID-Extra" header in a request,
> it's value is quoted and appended to the final transaction id value.
> 
> Here's a curl example that shows this all put together. You can see
> that I have my Swift-All-In-One dev environment configured to use the
> "saio" value for the |trans_id_suffix| value:
> 
> |$ curl -i -H "X-Auth-Token: AUTH_tk1bab51ce5e1d4e2bb6c54bccf59433ee"
> http://saio:8080/v1/AUTH_test/c/o --data-binary 1234 -XPUT -H
> "x-trans-id-extra: extraextra" HTTP/1.1 201 Created Last-Modified: Tue,
> 16 May 2017 17:04:17 GMT Content-Length: 0 Etag:
> 81dc9bdb52d04dc20036dbd8313ed055 Content-Type: text/html; charset=UTF-8
> X-Trans-Id: txf766cc02859c450eb4aef-00591b3110saio-extraextra
> X-Openstack-Request-Id:
> txf766cc02859c450eb4aef-00591b3110saio-extraextra Date: Tue, 16 May 2017
> 17:04:16 GMT $ $ curl -i -H "X-Auth-Token:
> AUTH_tk1bab51ce5e1d4e2bb6c54bccf59433ee"
> http://saio:8080/v1/AUTH_test/c/o -H "x-trans-id-extra:
> moredifferentextra" HTTP/1.1 200 OK Content-Length: 4 Accept-Ranges:
> bytes Last-Modified: Tue, 16 May 2017 17:04:17 GMT Etag:
> 81dc9bdb52d04dc20036dbd8313ed055 X-Timestamp: 1494954256.25977
> Content-Type: application/x-www-form-urlencoded X-Trans-Id:
> tx4013173098b348b6b7952-00591b34d2saio-moredifferentextra
> X-Openstack-Request-Id:
> tx4013173098b348b6b7952-00591b34d2saio-moredifferentextra Date: Tue, 16
> May 2017 17:20:18 GMT 1234 $ curl -i -H "X-Auth-Token:
> AUTH_tk1bab51ce5e1d4e2bb6c54bccf59433ee"
> http://saio:8080/v1/AUTH_test/c/o HTTP/1.1 200 OK Content-Length: 4
> Accept-Ranges: bytes Last-Modified: Tue, 16 May 2017 17:04:17 GMT Etag:
> 81dc9bdb52d04dc20036dbd8313ed055 X-Timestamp: 1494954256.25977
> Content-Type: application/x-www-form-urlencoded X-Trans-Id:
> txf66856a06d7547c4ad79d-00591b3527saio X-Openstack-Request-Id:
> txf66856a06d7547c4ad79d-00591b3527saio Date: Tue, 16 May 2017 17:21:43
> GMT 1234 |
> 
> The |X-Trans-ID-Extra| header, specifically, sounds very similar to
> what is being proposed to solve the cross-project reuqest IDs. To
> quote from the commit in Swift that added this:
> 
> |The value of the X-Trans-Id-Extra header on the request (if any) will
> now be appended to the transaction ID. This lets users put their own
> information into transaction IDs. For example, Glance folks upload
> images as large objects, so they'd like to be able to tie together all
> the segment PUTs and the manifest PUT with some operation ID in the
> logs. This would let them pass in that operation ID as X-Trans-Id-Extra,
> and then when things went wrong, it'd be

Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread John Dickinson


On 14 May 2017, at 4:04, Sean Dague wrote:

> One of the things that came up in a logging Forum session is how much effort 
> operators are having to put into reconstructing flows for things like server 
> boot when they go wrong, as every time we jump a service barrier the 
> request-id is reset to something new. The back and forth between Nova / 
> Neutron and Nova / Glance would be definitely well served by this. Especially 
> if this is something that's easy to query in elastic search.
>
> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from random 
> users. We're going to assume that's still a concern by some. However, since 
> the last time that came up, we've introduced the concept of "service users", 
> which are a set of higher priv services that we are using to wrap user 
> requests between services so that long running request chains (like image 
> snapshot). We trust these service users enough to keep on trucking even after 
> the user token has expired for this long run operations. We could use this 
> same trust path for request-id chaining.
>
> So, the basic idea is, services will optionally take an inbound 
> X-OpenStack-Request-ID which will be strongly validated to the format 
> (req-$uuid). They will continue to always generate one as well. When the 
> context is built (which is typically about 3 more steps down the paste 
> pipeline), we'll check that the service user was involved, and if not, reset 
> the request_id to the local generated one. We'll log both the global and 
> local request ids. All of these changes happen in oslo.middleware, 
> oslo.context, oslo.log, and most projects won't need anything to get this 
> infrastructure.
>
> The python clients, and callers, will then need to be augmented to pass the 
> request-id in on requests. Servers will effectively decide when they want to 
> opt into calling other services this way.
>
> This only ends up logging the top line global request id as well as the last 
> leaf for each call. This does mean that full tree construction will take more 
> work if you are bouncing through 3 or more servers, but it's a step which I 
> think can be completed this cycle.
>
> I've got some more detailed notes, but before going through the process of 
> putting this into an oslo spec I wanted more general feedback on it so that 
> any objections we didn't think about yet can be raised before going through 
> the detailed design.
>
>   -Sean
>
> -- 
> Sean Dague
> http://dague.net
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I'm not sure the best place to respond (mailing list or gerrit), so
I'll write this up and post it to both places.

I think the idea behind this proposal is great. It has the potential
to bring a lot of benefit to users who are tracing a request across
many different services, in part by making it easy to search in an
indexing system like ELK.

The current proposal has some elements that won't work with the way
Swift currently solves this problem. This is mostly due to the
proposed uuid-ish check for validation. However, the Swift solution
has a few aspects that I believe would be very helpful for the entire
community.

NB: Swift returns both an `X-OpenStack-Request-ID` and an `X-Trans-ID`
header in every response. The `X-Trans-ID` was implemented before the
OpenStack request ID was proposed, and so we've kept the `X-Trans-ID` so
as not to break existing clients. The value of `X-OpenStack-Request-ID`
in any response from Swift is simply a mirror of the `X-Trans-ID` value.

The request id in Swift is made up of a few parts:

X-Openstack-Request-Id: txbea0071df2b0465082501-00591b3077saio-extraextra


In the code, this in generated from:

'tx%s-%010x%s' % (uuid.uuid4().hex[:21], time.time(), 
quote(trans_id_suffix))

...meaning that there are three parts to the request id. Let's take
each in turn.

The first part always starts with 'tx' (originally from the
"transaction id") and then is the first 21 hex characters of a uuid4.
The truncation is to limit the overall length of the value.

The second part is the hex value of the current time, padded to 10
characters.

Finally, the third part is the quoted suffix, and it defaults to the
empty string. The suffix itself can be made of two parts. The first is
configured in the Swift proxy server itself (ie the service that does
the logging) via the `trans_id_suffix` config. This allows an operator
to set a different suffix for each API endpoint or each region or each
cluster in order to help distinguish them in logs. For example, if a
deployment with multiple clusters uses centralized log aggregation, a
different trans_id_suffix value for each clus

Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread Sean Dague
On 05/16/2017 11:28 AM, Eric Fried wrote:
>> The idea is that a regular user calling into a service should not
>> be able to set the request id, but outgoing calls from that service
>> to other services as part of the same request would.
> 
> Yeah, so can anyone explain to me why this is a real problem?  If a
> regular user wanted to be a d*ck and inject a bogus (or worse, I
> imagine, duplicated) request-id, can any actual harm come out of it?  Or
> does it just cause confusion to the guy reading the logs later?
> 
> (I'm assuming, of course, that the format will still be validated
> strictly (req-$UUID) to preclude code injection kind of stuff.)

Honestly, I don't know. I know it was once a concern. I'm totally happy
to remove the trust checking knowing we could add it back in later if
required.

Maybe reach out to some public cloud providers to know if they have any
issues with it?

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread Eric Fried
> The idea is that a regular user calling into a service should not
> be able to set the request id, but outgoing calls from that service
> to other services as part of the same request would.

Yeah, so can anyone explain to me why this is a real problem?  If a
regular user wanted to be a d*ck and inject a bogus (or worse, I
imagine, duplicated) request-id, can any actual harm come out of it?  Or
does it just cause confusion to the guy reading the logs later?

(I'm assuming, of course, that the format will still be validated
strictly (req-$UUID) to preclude code injection kind of stuff.)

Thanks,
Eric (efried)
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread Doug Hellmann
Excerpts from Chris Dent's message of 2017-05-16 15:28:11 +0100:
> On Sun, 14 May 2017, Sean Dague wrote:
> 
> > So, the basic idea is, services will optionally take an inbound 
> > X-OpenStack-Request-ID which will be strongly validated to the format 
> > (req-$uuid). They will continue to always generate one as well. When the 
> > context is built (which is typically about 3 more steps down the paste 
> > pipeline), we'll check that the service user was involved, and if not, 
> > reset 
> > the request_id to the local generated one. We'll log both the global and 
> > local request ids. All of these changes happen in oslo.middleware, 
> > oslo.context, oslo.log, and most projects won't need anything to get this 
> > infrastructure.
> 
> I may not be understanding this paragraph, but this sounds like you
> are saying: accept a valid and authentic incoming request id, but
> only use it in ongoing requests if the service user was involved in
> those requests.
> 
> If that's correct, I'd suggest not doing that because it confuses
> traceability of a series of things. Instead, always use the request
> id if it is valid and authentic.
> 
> But maybe you mean "if the request id could not be proven authentic,
> don't use it"?
> 

The idea is that a regular user calling into a service should not
be able to set the request id, but outgoing calls from that service
to other services as part of the same request would.

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread Sean Dague
On 05/16/2017 10:28 AM, Chris Dent wrote:
> On Sun, 14 May 2017, Sean Dague wrote:
> 
>> So, the basic idea is, services will optionally take an inbound
>> X-OpenStack-Request-ID which will be strongly validated to the format
>> (req-$uuid). They will continue to always generate one as well. When
>> the context is built (which is typically about 3 more steps down the
>> paste pipeline), we'll check that the service user was involved, and
>> if not, reset the request_id to the local generated one. We'll log
>> both the global and local request ids. All of these changes happen in
>> oslo.middleware, oslo.context, oslo.log, and most projects won't need
>> anything to get this infrastructure.
> 
> I may not be understanding this paragraph, but this sounds like you
> are saying: accept a valid and authentic incoming request id, but
> only use it in ongoing requests if the service user was involved in
> those requests.
> 
> If that's correct, I'd suggest not doing that because it confuses
> traceability of a series of things. Instead, always use the request
> id if it is valid and authentic.
> 
> But maybe you mean "if the request id could not be proven authentic,
> don't use it"?

It is a little more clear in the detailed spec, the issue is that the
place where this is generated is before we have enough ability to know
if we should be allowed to use it (it's actually before keystone auth).
I put some annotations of paste pipelines inline to help explain.

We either assume success, or assume failure, and fix later. We don't
actually have a functional logger using the request-id until we've got
keystone auth (bootstrapping is fun!) so assuming success, and reverting
if auth says no, actually should cause less confusion (and require less
code) than the other way around.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread Chris Dent

On Sun, 14 May 2017, Sean Dague wrote:

So, the basic idea is, services will optionally take an inbound 
X-OpenStack-Request-ID which will be strongly validated to the format 
(req-$uuid). They will continue to always generate one as well. When the 
context is built (which is typically about 3 more steps down the paste 
pipeline), we'll check that the service user was involved, and if not, reset 
the request_id to the local generated one. We'll log both the global and 
local request ids. All of these changes happen in oslo.middleware, 
oslo.context, oslo.log, and most projects won't need anything to get this 
infrastructure.


I may not be understanding this paragraph, but this sounds like you
are saying: accept a valid and authentic incoming request id, but
only use it in ongoing requests if the service user was involved in
those requests.

If that's correct, I'd suggest not doing that because it confuses
traceability of a series of things. Instead, always use the request
id if it is valid and authentic.

But maybe you mean "if the request id could not be proven authentic,
don't use it"?

--
Chris Dent  ┬──┬◡ノ(° -°ノ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Sean Dague
On 05/14/2017 07:04 AM, Sean Dague wrote:
> One of the things that came up in a logging Forum session is how much
> effort operators are having to put into reconstructing flows for things
> like server boot when they go wrong, as every time we jump a service
> barrier the request-id is reset to something new. The back and forth
> between Nova / Neutron and Nova / Glance would be definitely well served
> by this. Especially if this is something that's easy to query in elastic
> search.

FYI the oslo.spec for this is now up here for review -
https://review.openstack.org/#/c/464746/ - it has additional details in it.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Sean Dague
On 05/15/2017 09:35 AM, Doug Hellmann wrote:
> Excerpts from Sean Dague's message of 2017-05-14 07:04:03 -0400:
>> One of the things that came up in a logging Forum session is how much 
>> effort operators are having to put into reconstructing flows for things 
>> like server boot when they go wrong, as every time we jump a service 
>> barrier the request-id is reset to something new. The back and forth 
>> between Nova / Neutron and Nova / Glance would be definitely well served 
>> by this. Especially if this is something that's easy to query in elastic 
>> search.
>>
>> The last time this came up, some people were concerned that trusting 
>> request-id on the wire was concerning to them because it's coming from 
>> random users. We're going to assume that's still a concern by some. 
>> However, since the last time that came up, we've introduced the concept 
>> of "service users", which are a set of higher priv services that we are 
>> using to wrap user requests between services so that long running 
>> request chains (like image snapshot). We trust these service users 
>> enough to keep on trucking even after the user token has expired for 
>> this long run operations. We could use this same trust path for 
>> request-id chaining.
>>
>> So, the basic idea is, services will optionally take an inbound 
>> X-OpenStack-Request-ID which will be strongly validated to the format 
>> (req-$uuid). They will continue to always generate one as well. When the 
> 
> Do all of our services use that format for request ID? I thought Heat
> used something added on to a UUID, or at least longer than a UUID?

Don't know, now is a good time to speak up.
http://logs.openstack.org/85/464585/1/check/gate-heat-dsvm-functional-orig-mysql-lbaasv2-ubuntu-xenial/e1bca9e/logs/screen-h-eng.txt.gz#_2017-05-15_10_08_10_617
seems to indicate that it's using the format everyone else is using.

Swift does things a bit differently with suffixes, but they aren't using
the common middleware.

I've done code look throughs on nova/glance/cinder/neutron/keystone, but
beyond that folks will need to speak up as to where this might break
down. At worst failing validation just means you end up in the old
(current) behavior.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Doug Hellmann
Excerpts from Sean Dague's message of 2017-05-14 07:04:03 -0400:
> One of the things that came up in a logging Forum session is how much 
> effort operators are having to put into reconstructing flows for things 
> like server boot when they go wrong, as every time we jump a service 
> barrier the request-id is reset to something new. The back and forth 
> between Nova / Neutron and Nova / Glance would be definitely well served 
> by this. Especially if this is something that's easy to query in elastic 
> search.
> 
> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from 
> random users. We're going to assume that's still a concern by some. 
> However, since the last time that came up, we've introduced the concept 
> of "service users", which are a set of higher priv services that we are 
> using to wrap user requests between services so that long running 
> request chains (like image snapshot). We trust these service users 
> enough to keep on trucking even after the user token has expired for 
> this long run operations. We could use this same trust path for 
> request-id chaining.
> 
> So, the basic idea is, services will optionally take an inbound 
> X-OpenStack-Request-ID which will be strongly validated to the format 
> (req-$uuid). They will continue to always generate one as well. When the 

Do all of our services use that format for request ID? I thought Heat
used something added on to a UUID, or at least longer than a UUID?

Doug

> context is built (which is typically about 3 more steps down the paste 
> pipeline), we'll check that the service user was involved, and if not, 
> reset the request_id to the local generated one. We'll log both the 
> global and local request ids. All of these changes happen in 
> oslo.middleware, oslo.context, oslo.log, and most projects won't need 
> anything to get this infrastructure.
> 
> The python clients, and callers, will then need to be augmented to pass 
> the request-id in on requests. Servers will effectively decide when they 
> want to opt into calling other services this way.
> 
> This only ends up logging the top line global request id as well as the 
> last leaf for each call. This does mean that full tree construction will 
> take more work if you are bouncing through 3 or more servers, but it's a 
> step which I think can be completed this cycle.
> 
> I've got some more detailed notes, but before going through the process 
> of putting this into an oslo spec I wanted more general feedback on it 
> so that any objections we didn't think about yet can be raised before 
> going through the detailed design.
> 
> -Sean
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Sean Dague
On 05/15/2017 08:16 AM, Lance Bragstad wrote:
> 
> 
> On Mon, May 15, 2017 at 6:20 AM, Sean Dague  > wrote:
> 
> On 05/15/2017 05:59 AM, Andrey Volkov wrote:
> >
> >> The last time this came up, some people were concerned that trusting
> >> request-id on the wire was concerning to them because it's coming from
> >> random users.
> >
> > TBH I don't see the reason why a validated request-id value can't be
> > logged on a callee service side, probably because I missed some previous
> > context. Could you please give an example of such concerns?
> >
> > With service user I see two blocks:
> > - A callee service needs to know if it's "special" user or not.
> > - Until all services don't use a service user we'll not get the 
> complete trace.
> 
> That is doable, but then you need to build special tools to generate
> even basic flows. It means that the Elastic Search use case (where
> plopping in a request id shows you things across services) does not
> work. Because the child flows don't have the new id.
> 
> It's also fine to *also* cross log the child/callee request idea on the
> parent/caller, but it's not actually going to be sufficiently useful to
> most people.
> 
> 
> +1
> 
> To me it makes sense to supply the override so that a single request-id
> can track multiple operations across services. But I'm struggling to
> find a case where passing a list(global_request_id, local_request_id) is
> useful. This might be something we can elaborate on later, if we find a
> use case for including multiple request-ids.

I'm not sure I understand the question... so perhaps some examples

The theory is, say you kick off a Nova server build, you'll see
something like:

2017 May 15 nova-api [req-0001---4
req-0001---4 my_project my_user]
2017 May 15 nova-compute [req-0001---4
req-0001---4 my_project my_user]

Then when calling into glance for image download nova would pass in
X-OpenStack-Request-ID: req-0001---4, so that in the
glance logs you'd see:

2017 May 15 glance-api [req-0001---4
req-aef2---7 my_project my_user]

The second id is locally generated during the inbound request. If no
global id is sent (or we decide later that the caller was not
sufficiently trusted), the global id will be set to the local id.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Lance Bragstad
On Mon, May 15, 2017 at 6:20 AM, Sean Dague  wrote:

> On 05/15/2017 05:59 AM, Andrey Volkov wrote:
> >
> >> The last time this came up, some people were concerned that trusting
> >> request-id on the wire was concerning to them because it's coming from
> >> random users.
> >
> > TBH I don't see the reason why a validated request-id value can't be
> > logged on a callee service side, probably because I missed some previous
> > context. Could you please give an example of such concerns?
> >
> > With service user I see two blocks:
> > - A callee service needs to know if it's "special" user or not.
> > - Until all services don't use a service user we'll not get the complete
> trace.
>
> That is doable, but then you need to build special tools to generate
> even basic flows. It means that the Elastic Search use case (where
> plopping in a request id shows you things across services) does not
> work. Because the child flows don't have the new id.
>
> It's also fine to *also* cross log the child/callee request idea on the
> parent/caller, but it's not actually going to be sufficiently useful to
> most people.
>

+1

To me it makes sense to supply the override so that a single request-id can
track multiple operations across services. But I'm struggling to find a
case where passing a list(global_request_id, local_request_id) is useful.
This might be something we can elaborate on later, if we find a use case
for including multiple request-ids.


>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Sean Dague
On 05/15/2017 05:59 AM, Andrey Volkov wrote:
> 
>> The last time this came up, some people were concerned that trusting 
>> request-id on the wire was concerning to them because it's coming from 
>> random users.
> 
> TBH I don't see the reason why a validated request-id value can't be
> logged on a callee service side, probably because I missed some previous
> context. Could you please give an example of such concerns?
> 
> With service user I see two blocks:
> - A callee service needs to know if it's "special" user or not.
> - Until all services don't use a service user we'll not get the complete 
> trace.

That is doable, but then you need to build special tools to generate
even basic flows. It means that the Elastic Search use case (where
plopping in a request id shows you things across services) does not
work. Because the child flows don't have the new id.

It's also fine to *also* cross log the child/callee request idea on the
parent/caller, but it's not actually going to be sufficiently useful to
most people.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-15 Thread Andrey Volkov

> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from 
> random users.

TBH I don't see the reason why a validated request-id value can't be
logged on a callee service side, probably because I missed some previous
context. Could you please give an example of such concerns?

With service user I see two blocks:
- A callee service needs to know if it's "special" user or not.
- Until all services don't use a service user we'll not get the complete trace.

Sean Dague writes:

> One of the things that came up in a logging Forum session is how much 
> effort operators are having to put into reconstructing flows for things 
> like server boot when they go wrong, as every time we jump a service 
> barrier the request-id is reset to something new. The back and forth 
> between Nova / Neutron and Nova / Glance would be definitely well served 
> by this. Especially if this is something that's easy to query in elastic 
> search.
>
> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from 
> random users. We're going to assume that's still a concern by some. 
> However, since the last time that came up, we've introduced the concept 
> of "service users", which are a set of higher priv services that we are 
> using to wrap user requests between services so that long running 
> request chains (like image snapshot). We trust these service users 
> enough to keep on trucking even after the user token has expired for 
> this long run operations. We could use this same trust path for 
> request-id chaining.
>
> So, the basic idea is, services will optionally take an inbound 
> X-OpenStack-Request-ID which will be strongly validated to the format 
> (req-$uuid). They will continue to always generate one as well. When the 
> context is built (which is typically about 3 more steps down the paste 
> pipeline), we'll check that the service user was involved, and if not, 
> reset the request_id to the local generated one. We'll log both the 
> global and local request ids. All of these changes happen in 
> oslo.middleware, oslo.context, oslo.log, and most projects won't need 
> anything to get this infrastructure.
>
> The python clients, and callers, will then need to be augmented to pass 
> the request-id in on requests. Servers will effectively decide when they 
> want to opt into calling other services this way.
>
> This only ends up logging the top line global request id as well as the 
> last leaf for each call. This does mean that full tree construction will 
> take more work if you are bouncing through 3 or more servers, but it's a 
> step which I think can be completed this cycle.
>
> I've got some more detailed notes, but before going through the process 
> of putting this into an oslo spec I wanted more general feedback on it 
> so that any objections we didn't think about yet can be raised before 
> going through the detailed design.
>
>   -Sean

-- 
Thanks,

Andrey Volkov,
Software Engineer, Mirantis, Inc.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-14 Thread Tim Bell

> On 14 May 2017, at 13:04, Sean Dague  wrote:
> 
> One of the things that came up in a logging Forum session is how much effort 
> operators are having to put into reconstructing flows for things like server 
> boot when they go wrong, as every time we jump a service barrier the 
> request-id is reset to something new. The back and forth between Nova / 
> Neutron and Nova / Glance would be definitely well served by this. Especially 
> if this is something that's easy to query in elastic search.
> 
> The last time this came up, some people were concerned that trusting 
> request-id on the wire was concerning to them because it's coming from random 
> users. We're going to assume that's still a concern by some. However, since 
> the last time that came up, we've introduced the concept of "service users", 
> which are a set of higher priv services that we are using to wrap user 
> requests between services so that long running request chains (like image 
> snapshot). We trust these service users enough to keep on trucking even after 
> the user token has expired for this long run operations. We could use this 
> same trust path for request-id chaining.
> 
> So, the basic idea is, services will optionally take an inbound 
> X-OpenStack-Request-ID which will be strongly validated to the format 
> (req-$uuid). They will continue to always generate one as well. When the 
> context is built (which is typically about 3 more steps down the paste 
> pipeline), we'll check that the service user was involved, and if not, reset 
> the request_id to the local generated one. We'll log both the global and 
> local request ids. All of these changes happen in oslo.middleware, 
> oslo.context, oslo.log, and most projects won't need anything to get this 
> infrastructure.
> 
> The python clients, and callers, will then need to be augmented to pass the 
> request-id in on requests. Servers will effectively decide when they want to 
> opt into calling other services this way.
> 
> This only ends up logging the top line global request id as well as the last 
> leaf for each call. This does mean that full tree construction will take more 
> work if you are bouncing through 3 or more servers, but it's a step which I 
> think can be completed this cycle.
> 
> I've got some more detailed notes, but before going through the process of 
> putting this into an oslo spec I wanted more general feedback on it so that 
> any objections we didn't think about yet can be raised before going through 
> the detailed design.

This is very consistent with what I had understood during the forum session. 
Having a single request id across multiple services as the end user operation 
is performed would be a great help in operations, where we are often using a 
solution like ElasticSearch/Kibana to show logs and interactively query the 
timing and results of a given request id. It would also improve traceability 
during investigations where we are aiming to determine who the initial 
requesting user.

Tim

> 
>   -Sean
> 
> -- 
> Sean Dague
> http://dague.net
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-14 Thread Sean Dague
One of the things that came up in a logging Forum session is how much 
effort operators are having to put into reconstructing flows for things 
like server boot when they go wrong, as every time we jump a service 
barrier the request-id is reset to something new. The back and forth 
between Nova / Neutron and Nova / Glance would be definitely well served 
by this. Especially if this is something that's easy to query in elastic 
search.


The last time this came up, some people were concerned that trusting 
request-id on the wire was concerning to them because it's coming from 
random users. We're going to assume that's still a concern by some. 
However, since the last time that came up, we've introduced the concept 
of "service users", which are a set of higher priv services that we are 
using to wrap user requests between services so that long running 
request chains (like image snapshot). We trust these service users 
enough to keep on trucking even after the user token has expired for 
this long run operations. We could use this same trust path for 
request-id chaining.


So, the basic idea is, services will optionally take an inbound 
X-OpenStack-Request-ID which will be strongly validated to the format 
(req-$uuid). They will continue to always generate one as well. When the 
context is built (which is typically about 3 more steps down the paste 
pipeline), we'll check that the service user was involved, and if not, 
reset the request_id to the local generated one. We'll log both the 
global and local request ids. All of these changes happen in 
oslo.middleware, oslo.context, oslo.log, and most projects won't need 
anything to get this infrastructure.


The python clients, and callers, will then need to be augmented to pass 
the request-id in on requests. Servers will effectively decide when they 
want to opt into calling other services this way.


This only ends up logging the top line global request id as well as the 
last leaf for each call. This does mean that full tree construction will 
take more work if you are bouncing through 3 or more servers, but it's a 
step which I think can be completed this cycle.


I've got some more detailed notes, but before going through the process 
of putting this into an oslo spec I wanted more general feedback on it 
so that any objections we didn't think about yet can be raised before 
going through the detailed design.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev