Hi, Maybe a missing link here is another component in a jwt stateless architecture which is *blacklisting* malign tokens when necessary. This is obviously a sort of state which needs to be handled in a datastore; but it's quite different and easy to scale and has less performance impact (I guess especially under DDOS) than doing full auth queries. I believe this should be the approach on the API Gateway roadmap Thanks
On 9 May 2017 21:14, "Chris Lemmons" <alfic...@gmail.com> wrote: > I'll second the principle behind "start with security, optimize when > there's a problem". > > It seems to me that in order to maintain security, basically everyone would > need to dial the revalidate time so close to zero that it does very little > good as a cache on the credentials. Otherwise, as Rob as pointed out, the > TTL on your credential cache is effectively "how long am I ok with hackers > in control after I find them". Practically, it also means that much lag on > adding or removing permissions. That effectively means a database hit for > every query, or near enough to every query as not to matter. > > That said, you can get the best of multiple worlds, I think. The only DB > query that really has to be done is "give me the last update time for this > user". Compare that to the generation time in the token and 99% of the > time, it's the only query you need. With that check, you can even use > fairly long-lived tokens. If anything about the user has changed, reject > the token, generate a new one, send that to the user and use it. The > regenerate step is somewhat expensive, but still well inside reasonable, I > think. > > On Tue, May 9, 2017 at 11:31 AM Robert Butts <robert.o.bu...@gmail.com> > wrote: > > > > The TO service (and any other service that requires auth) MUST hit the > > database (or the auth service, which itself hits the database) to verify > > valid tokens' users still have the permissions they did when the token > was > > created. Otherwise, it's impossible to revoke tokens, e.g. if an employee > > quits, or an attacker gains a token, or a user changes their password. > > > > I'm elaborating on this, and moving a discussion from a PR review here. > > > > From the code submissions to the repo, it appears the current plan is for > > the API Gateway to create a JWT, and then for that JWT to be accepted by > > all Traffic Ops microservices, with no database authentication. > > > > It's a common misconception that JWT allows you authenticate without > > hitting the database. This is an exceedingly dangerous misconception. If > > you don't check the database when every authenticated route is requested, > > it's impossible to revoke access. In practice, this means the JWT TTL > > becomes the length of time _after you discover an attacker is > manipulating > > your production system_, before it's _possible_ to evict them. > > > > How long do you feel is acceptable to have a hacker in and manipulating > > your system, after you discover them? A day? An hour? Five minutes? > > Whatever your TTL, that's the length of time you're willing to allow a > > hacker to steal and destroy you and your customers' data. Worse, because > > this is a CDN, it's the length of time you're willing to allow your CDN > to > > be used to DDOS a target. > > > > Are you going to explain in court that the DDOS your system executed > lasted > > 24 hours, or 1 hour, or 10 minutes after you discovered it, because > that's > > the TTL you hard-coded? Are you going to explain to a judge and > prosecuting > > attorney exactly which sensitive data was stolen in the ten minutes after > > you discovered the attacker in your system, before their JWT expired? > > > > If you're willing to accept the legal consequences, that's your business. > > Apache Traffic Control should not require users to accept those > > consequences, and ideally shouldn't make it possible, as many users won't > > understand the security risks. > > > > The argument has been made "authorization does not check the database to > > avoid congestion" -- Has anyone tested this in practice? The database > query > > itself is 50ms. Assuming your database and service are 2500km apart, > that's > > another 50ms network latency. Traffic Ops has endpoints that take 10s to > > generate. Worst-case scenario, this will double the time of tiny > endpoints > > to 200ms, and increase large endpoints inconsequentially. It's highly > > unlikely performance is an issue in practice. > > > > As Jan said, we can still have the services check the auth as well after > > the proxy auth. Moreover, the services don't even have to know about the > > auth service, they can hit a mapped route on the API Gateway, which gives > > us better modularisation and separation of concerns. > > > > It's not difficult, it can be a trivial endpoint on the auth service, > > remapped in the API Gateway, which takes the JWT token and returns true > if > > it's still authorized in the database. To be clear, this is not a problem > > today. Traffic Ops still uses the Mojolicious cookie today, so this would > > only need done if and when we remove that, or if we move authorized > > endpoints out of Traffic Ops into their own microservices. > > > > Considering the significant security and legal risks, we should always > hit > > the database to validate requests of authorized endpoints, and reconsider > > if and when someone observes performance issues in practice. > > > > > > On Tue, May 9, 2017 at 6:56 AM, Dewayne Richardson <dewr...@gmail.com> > > wrote: > > > > > If only the API GW authenticates/authorizes we also have a single point > > of > > > entry to test for security instead of having it sprinkled across > services > > > in different ways. It also simplifies the code on the service side and > > > makes them easier to test with automation. > > > > > > -Dew > > > > > > On Mon, May 8, 2017 at 8:42 AM, Robert Butts <robert.o.bu...@gmail.com > > > > > wrote: > > > > > > > > couldn't make nginx or http do what we need. > > > > > > > > I was suggesting a different architecture. Not making the proxy do > > auth, > > > > only standard proxying. > > > > > > > > > We can still have the services check the auth as well after the > proxy > > > > auth > > > > > > > > +1 > > > > > > > > > > > > On Mon, May 8, 2017 at 3:36 AM, Amir Yeshurun <am...@qwilt.com> > wrote: > > > > > > > > > Hi, > > > > > > > > > > Let me elaborate some more on the purpose of the API GW. I will put > > up > > > a > > > > > wiki page following our discussions here. > > > > > > > > > > Main purpose is to allow innovation by creating new services that > > > handle > > > > TO > > > > > functionality, not as a part of the monolithic Mojo app. > > > > > The long term vision is to de-compose TO into multiple > microservices, > > > > > allowing new functionality easily added. > > > > > Indeed, the goal it to eventually deprecate the current AAA model, > > and > > > > > replace it with the new AAA model currently under work (user-roles, > > > > > role-capabilities) > > > > > > > > > > I think that handling authorization in the API layer is a valid > > > approach. > > > > > Security wise, I don't see much difference between that, and having > > > each > > > > > module access the auth service, as long as the auth service is > > deployed > > > > in > > > > > the backend. > > > > > Having another proxy (nginx?) fronting the world and forwarding all > > > > > requests to the backend GW mitigates the risk for compromising the > > > > > authorization service. > > > > > However, as mentioned above, we can still have the services check > the > > > > auth > > > > > as well after the proxy auth. > > > > > > > > > > It is a standalone process, completely optional at this point. One > > can > > > > > choose to deploy it in order to allow integration with additional > > > > > services. Deployment > > > > > and management are still T.B.D, and feedback on this is most > welcome. > > > > > > > > > > Regarding token validation and revocation: > > > > > Tokens have expiration time. Expired tokens do not pass token > > > validation. > > > > > In production, expiration should be set to relatively short time, > > say 5 > > > > > minute. > > > > > This way revocation is automatic. Re-authentication is handled via > > > > refresh > > > > > tokens (not implemented yet). Hitting the DB upon every API call > > cause > > > > > congestion on users DB. > > > > > To avoid that, we chose to have all user information self-contained > > > > inside > > > > > the JWT. > > > > > > > > > > Thanks > > > > > /amiry > > > > > > > > > > On Mon, May 8, 2017 at 5:42 AM Jan van Doorn <j...@knutsel.com> > > wrote: > > > > > > > > > > > It's the reverse proxy we've discussed for the "micro services" > > > version > > > > > for > > > > > > a while now (as in > > > > > > > > https://cwiki.apache.org/confluence/display/TC/Design+Overview+v3.0 > > > ). > > > > > > > > > > > > On Sun, May 7, 2017 at 7:22 PM Eric Friedrich (efriedri) < > > > > > > efrie...@cisco.com> > > > > > > wrote: > > > > > > > > > > > > > From a higher level- what is purpose of the API Gateway? It > > seems > > > > like > > > > > > > there may have been some previous discussions about API > Gateway. > > > Are > > > > > > there > > > > > > > any notes or description that I can catch up on? > > > > > > > > > > > > > > How will it be deployed? (Is it a standalone service or > something > > > > that > > > > > > > runs inside the experimental Traffic Ops)? > > > > > > > > > > > > > > Is this new component required or optional? > > > > > > > > > > > > > > —Eric > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On May 7, 2017, at 8:28 PM, Jan van Doorn <j...@knutsel.com> > > > wrote: > > > > > > > > > > > > > > > > I looked into this a year or so ago, and I couldn't make > nginx > > or > > > > > http > > > > > > do > > > > > > > > what we need. > > > > > > > > > > > > > > > > We can still have the services check the auth as well after > the > > > > proxy > > > > > > > auth, > > > > > > > > and make things better than today, where we have the same > > problem > > > > > that > > > > > > if > > > > > > > > the TO mojo app is compromised, everything is compromised. > > > > > > > > > > > > > > > > If we always route to TO, we don't untangle the mess of being > > > > > dependent > > > > > > > on > > > > > > > > the monolithic TO for everything. Many services today, and > more > > > in > > > > > the > > > > > > > > future really just need a check to see if the user is > > authorized, > > > > and > > > > > > > > nothing more. > > > > > > > > > > > > > > > > On Sun, May 7, 2017 at 11:55 AM Robert Butts < > > > > > robert.o.bu...@gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> What are the advantages of these config files, over an > > existing > > > > > > reverse > > > > > > > >> proxy, like Nginx or httpd? It's just as much work as > > > configuring > > > > > and > > > > > > > >> deploying an existing product, but more code we have to > write > > > and > > > > > > > maintain. > > > > > > > >> I'm having trouble seeing the advantage. > > > > > > > >> > > > > > > > >> -1 on auth rules as a part of the proxy. Making a proxy care > > > about > > > > > > auth > > > > > > > >> violates the Single Responsibility Principle, and further, > is > > a > > > > > > security > > > > > > > >> risk. It creates unnecessary attack surface. If your proxy > app > > > or > > > > > > > server is > > > > > > > >> compromised, the entire framework is now compromised. An > > > attacker > > > > > > could > > > > > > > >> simply rewrite the proxy config to make all routes no-auth. > > > > > > > >> > > > > > > > >> The simple alternative is for the proxy to always route to > TO, > > > and > > > > > TO > > > > > > > >> checks the token against the auth service (which may also be > > > > > proxied), > > > > > > > and > > > > > > > >> redirects unauthorized requests to a login endpoint (which > may > > > > also > > > > > be > > > > > > > >> proxied). > > > > > > > >> > > > > > > > >> The TO service (and any other service that requires auth) > MUST > > > hit > > > > > the > > > > > > > >> database (or the auth service, which itself hits the > database) > > > to > > > > > > verify > > > > > > > >> valid tokens' users still have the permissions they did when > > the > > > > > token > > > > > > > was > > > > > > > >> created. Otherwise, it's impossible to revoke tokens, e.g. > if > > an > > > > > > > employee > > > > > > > >> quits, or an attacker gains a token, or a user changes their > > > > > password. > > > > > > > >> > > > > > > > >> > > > > > > > >> On Sun, May 7, 2017 at 4:35 AM, Amir Yeshurun < > > am...@qwilt.com> > > > > > > wrote: > > > > > > > >> > > > > > > > >>> Seems that attachments are stripped on this list. Examples > > > pasted > > > > > > below > > > > > > > >>> > > > > > > > >>> *rules.json* > > > > > > > >>> [ > > > > > > > >>> { "host": "localhost", "path": "/login", > > > > > "forward": > > > > > > > >>> "localhost:9004", "scheme": "https", "auth": false }, > > > > > > > >>> { "host": "localhost", "path": "/api/1.2/innovation/", > > > > > "forward": > > > > > > > >>> "localhost:8004", "scheme": "http", "auth": true, > > > "routes-file": > > > > > > > >>> "innovation.json" }, > > > > > > > >>> { "host": "localhost", "path": "/api/1.2/", > > > > > "forward": > > > > > > > >>> "localhost:3000", "scheme": "http", "auth": true, > > > "routes-file": > > > > > > > >>> "traffic-ops-routes.json" }, > > > > > > > >>> { "host": "localhost", "path": "/internal/api/1.2/", > > > > > "forward": > > > > > > > >>> "localhost:3000", "scheme": "http", "auth": true, > > > "routes-file": > > > > > > > >>> "internal-routes.json" } > > > > > > > >>> ] > > > > > > > >>> > > > > > > > >>> *traffic-ops-routes.json (partial)* > > > > > > > >>> . > > > > > > > >>> . > > > > > > > >>> . > > > > > > > >>> { "match": "/cdns/health", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-health-read"] }}, > > > > > > > >>> { "match": "/cdns/capacity", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-health-read"] }}, > > > > > > > >>> { "match": "/cdns/usage/overview", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-stats-read"] }}, > > > > > > > >>> { "match": "/cdns/name/dnsseckeys/generate", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-security-keys-read"] }}, > > > > > > > >>> { "match": "/cdns/name/[^\/]+/?", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-read"] }}, > > > > > > > >>> { "match": "/cdns/name/[^\/]+/sslkeys", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-security-keys-read"] }}, > > > > > > > >>> { "match": "/cdns/name/[^\/]+/dnsseckeys", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-security-keys-read"] }}, > > > > > > > >>> { "match": "/cdns/name/[^\/]+/dnsseckeys/delete", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-security-keys-write"] }}, > > > > > > > >>> { "match": "/cdns/[^\/]+/queue_update", > "auth": > > { > > > > > > "POST": > > > > > > > >>> ["queue-updates-write"] }}, > > > > > > > >>> { "match": "/cdns/[^\/]+/snapshot", > "auth": > > { > > > > > "PUT": > > > > > > > >>> ["cdn-config-snapshot-write"] }}, > > > > > > > >>> { "match": "/cdns/[^\/]+/health", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-health-read"] }}, > > > > > > > >>> { "match": "/cdns/[^\/]+/?", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-read"], "PUT": ["cdn-write"], "PATCH": > ["cdn-write"], > > > > > > "DELETE": > > > > > > > >>> ["cdn-write"] }}, > > > > > > > >>> { "match": "/cdns", > "auth": > > { > > > > > "GET": > > > > > > > >>> ["cdn-read"], "POST": ["cdn-write"] }}, > > > > > > > >>> > > > > > > > >>> . > > > > > > > >>> . > > > > > > > >>> . > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> On Sun, May 7, 2017 at 12:39 PM Amir Yeshurun < > > am...@qwilt.com > > > > > > > > > > wrote: > > > > > > > >>> > > > > > > > >>>> Attached please find examples for forwarding rules file > > > > > (rules.json) > > > > > > > >> and > > > > > > > >>>> the authorization rules file (traffic-ops-routes.json) > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> On Sun, May 7, 2017 at 10:39 AM Amir Yeshurun < > > > am...@qwilt.com> > > > > > > > wrote: > > > > > > > >>>> > > > > > > > >>>>> Hi all, > > > > > > > >>>>> > > > > > > > >>>>> I am about to submit a PR with a first operational > version > > of > > > > the > > > > > > API > > > > > > > >>> GW, > > > > > > > >>>>> to the "experimental" code base. > > > > > > > >>>>> > > > > > > > >>>>> The API GW forwarding logic is as follow: > > > > > > > >>>>> > > > > > > > >>>>> 1. Find host to forward the request: Prefix match on > the > > > > > request > > > > > > > >> path > > > > > > > >>>>> against a list of forwarding rules. The matched > > forwarding > > > > rule > > > > > > > >>> defines the > > > > > > > >>>>> target's host, and the target's *authorization rules*. > > > > > > > >>>>> 2. Authorization: Regex match on the request path > > against a > > > > > list > > > > > > of > > > > > > > >>> *authorization > > > > > > > >>>>> rules*. The matched rule defines the required > > capabilities > > > to > > > > > > > >> perform > > > > > > > >>>>> the HTTP method on the route. These capabilities are > > > compared > > > > > > > >>> against the > > > > > > > >>>>> user's capabilities in the user's JWT > > > > > > > >>>>> > > > > > > > >>>>> At this moment, the 2 sets of rules are hard-coded in > json > > > > files. > > > > > > The > > > > > > > >>>>> files are provided with the API GW distribution and > contain > > > > > > > >> definitions > > > > > > > >>> for > > > > > > > >>>>> TC 2.0 API routes. I have tested parts of the API, > however, > > > > there > > > > > > > >> might > > > > > > > >>> be > > > > > > > >>>>> mistakes in some of the routes. Please be warned. > > > > > > > >>>>> > > > > > > > >>>>> Considering manageability and high availability, I am > aware > > > > that > > > > > > > using > > > > > > > >>>>> local files for storing the set of authorization rules is > > > > > inferior > > > > > > to > > > > > > > >>>>> centralized configuration. > > > > > > > >>>>> > > > > > > > >>>>> We are considering different approaches for centralized > > > > > > > configuration, > > > > > > > >>>>> having the following points in mind > > > > > > > >>>>> > > > > > > > >>>>> - Microservice world: API GW will front multiple > > services, > > > > not > > > > > > only > > > > > > > >>>>> Mojo. It can also front other TC components like > Traffic > > > > Stats > > > > > > and > > > > > > > >>> Traffic > > > > > > > >>>>> Monitor. Each service defines its own routes and > > > > capabilities. > > > > > > Here > > > > > > > >>> comes > > > > > > > >>>>> the question of what is the "source of truth" for the > > route > > > > > > > >>> definitions. > > > > > > > >>>>> - Handling private routes. API GW may front non-TC > > > services. > > > > > > > >>>>> - User changes to the AAA scheme. The ability for admin > > > user > > > > to > > > > > > > >> makes > > > > > > > >>>>> changes in the required capabilities of a route, maybe > > even > > > > > > define > > > > > > > >>> new > > > > > > > >>>>> capability names, was raised in the past as a use case > > that > > > > > > should > > > > > > > >> be > > > > > > > >>>>> supported. > > > > > > > >>>>> - Easy development and deployment of new services. > > > > > > > >>>>> - Using TO DB for expediency. > > > > > > > >>>>> > > > > > > > >>>>> I would appreciate any feedback and views on your > approach > > to > > > > > > manage > > > > > > > >>>>> route definitions. > > > > > > > >>>>> > > > > > > > >>>>> Thanks > > > > > > > >>>>> /amiry > > > > > > > >>>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >