On Fri, Jan 27, 2017 at 1:31 PM, Eric Anderson <ej...@google.com> wrote:

> On Fri, Jan 27, 2017 at 12:16 PM, 'Mark D. Roth' via grpc.io <
> grpc-io@googlegroups.com> wrote:
>
>> Yes. And that seems to agree with how the different proxy choosing logic
>>> will work; the first primarily consumes hostnames and returns proxy
>>> hostnames (which is http_proxy in C) and the second one primarily consumes
>>> IPs and returns proxy IPs.
>>>
>>
>> I don't think that's actually entirely correct.  The first case doesn't
>> consume anything; it unconditionally sets the hostname to be resolved.
>>
>
> The first case will consume a hostname in Java. Observing the hostname is
> necessary to fix the mixed internal/external in an expanded view of case 1.
> Since the Java APIs support that mixed case, Java ends up needing to
> support them. And if C ever needed to support the mixed case (which seems
> likely to me), then it would also need to use the hostname.
>
>
>> And the second case can consume either the hostname or the IP.
>>
>
> And I wouldn't be surprised if only IP were used. We're not aware of a
> user of it.
>
> This is more philosophical than practical,
>>
>
> My further explanation there was meant to be more philosophical, as an
> explanation that this "special case" is pretty normal and sort of agrees
> with the rest of the design.
>
> But that philosophical debate aside, I think that we should focus on case
>>>> 3, because that's a concrete case that we do want to support.  So far, at
>>>> least, I have not heard a workable proposal that does not require the proxy
>>>> mapper to control the CONNECT argument (although I'm certainly still open
>>>> to new proposals).
>>>>
>>>
>>> I've provided two proposals. Neither of which seem debunked as of yet. I
>>> could totally agree they may be worse than what you are proposing, but the
>>> discussion hasn't gotten to that point. The mentioned security issue of the
>>> first proposal seemed to ignore the fact that a reverse proxy could be used
>>> to "protect" the LB, in an identical fashion to any forward proxy.
>>>
>>
>> I don't quite understand the proposed reverse proxy approach.  Can you
>> explain how that would work in more detail?
>>
>
> Case 3 as stated today (for contrasting)
>
>    1. client wants to connect to service.example.com
>    2. do DNS SRV resolution for _grpclb._tcp.service.example.com; you
>    find it is a LB with name lb.example.com
>    3. do a DNS resolution for lb.example.com, get IP 1.2.3.4
>    4. ask the proxy mapper about IP 1.2.3.4, it recognizes the IP as the
>    proxy and says to use "CONNECT service.example.com" via proxy IP
>    1.2.3.4
>    5. connect to proxy 1.2.3.4, it performs internal resolution of
>    service.example.com and connects to one of the hosts
>
> That's not actually an accurate representation of how case 3 is proposed
to work in the current document.  The document is actually proposing the
following:

   1. client wants to connect to service.example.com
   2. do DNS SRV resolution for _grpclb._tcp.service.example.com; you find
   it is a LB with name lb.example.com
   3. do a DNS resolution for lb.example.com, get IP 1.2.3.4
   4. ask the proxy mapper about IP 1.2.3.4; it recognizes the IP as the
   proxy and says to use "CONNECT lb.example.com" via proxy IP 1.2.3.4
   5. connect to proxy 1.2.3.4 with "CONNECT lb.example.com"; proxy does
   internal name resolution and connects to one of the load balancers
   6. send grpclb request; get response indicating that the backend server
   is 5.6.7.8
   7. ask the proxy mapper about IP 5.6.7.8; it recognizes it as an
   internal IP address and says to use "CONNECT 5.6.7.8" via proxy IP 1.2.3.4
   8. connect to proxy 1.2.3.4 with "CONNECT 5.6.7.8"; proxy connects to
   the specified backend server

Remember that the goal of case 3 is to allow client-side per-call load
balancing, despite not being able to resolve the internal names of the
backend servers.  Instead of getting those from DNS, we get them from the
grpclb balancer.



> Case 3 using reverse proxy for LB
>
>    1. client wants to connect to service.example.com
>    2. do DNS SRV resolution for _grpclb._tcp.service.example.com; you
>    find it is a LB with name lb.example.com
>    3. do a DNS resolution for lb.example.com, get IP 1.2.3.4
>    4. (different starting here) connect to 1.2.3.4, which is a
>    transparent reverse proxy
>    5. Perform an RPC to 1.2.3.4. Host header is lb.example.com. The proxy
>    performs internal mapping of lb.example.com to internal addresses and
>    connects to one of the hosts, forwarding the RPC.
>
> The reverse proxy approach is essentially what I originally suggested for
case 3, but Julien argued that it would be a security problem.

Keep in mind that in case 3, the grpclb load balancers and the server
backends are in the same internal domain, with the same access
restrictions.  If we can't use a reverse proxy to access the server
backends, I don't think we'll be able to do that for the grpclb balancers
either.

That having been said, as a security issue, Julien can address this
directly.



>
> I agree that case 3 requires different parts of the system to be
>> coordinated.  For example, assuming that your proxy mapper implementation
>> is getting the list of proxy addresses from a local file, you would need to
>> first push an updated list that contains the new proxy address to all
>> clients.  Then, once all clients have been updated, you can add the new
>> proxy to DNS.
>>
>
> And the file needs to contain old proxy addresses that should be used for
> detection but not be used.
>
> Okay. So we're on the same page there.
>
> I agree that this is cumbersome, but I think it's an inherent problem with
>> case 3, because you need *some* way to configure the clients.
>>
>
> I agree you need to be able to configure the clients. I understand that
> something needs to tell the client what to do. My concern was the pain of
> updating the proxy mapping list in concert with name resolution. And
> because of that I would recommend implementors to use the magic IP, because
> it has less operational overhead and less likelihood of failing.
>
> If you assume "one proxy" which has "one static IP" and everything is
>>> hard-coded, then the design is fine. But that seems unlikely to describe a
>>> productionized system. And that's why I would feel forced to use the "magic
>>> IP" that it seems you have previously rejected.
>>>
>>
>> There are a couple of reasons that I don't like the "magic IP" approach.
>> First, it requires writing a custom resolver in addition to a custom proxy
>> mapper,
>>
>
> No, I'd just have DNS return the trash IP.
>

This actually makes me even less happy with the sentinel-value approach,
because now we wouldn't just be using the value internally in a particular
piece of software; we'd actually be publishing it in a way that would be
very confusing when people were trying to debug the system from an
operational perspective.  ("Wait, why is the client even attempting to
connect to the proxy, since DNS points it at this bogus IP address?")



>
> I'm not a big fan of "sentinel" values, since it's often hard to find a
>> value that will never be used in real life.
>>
>
> I would gladly accept a magic value instead of needing to make sure two
> systems stay in sync and rollouts happen properly. And I would quickly
> recommend that to others. And if I started explaining the gotchas of the
> alternative, I'd expect them to quickly be thankful for the recommendation
> since it is less code to write and less operational complexity.
>

I do see your point, but I think that the sentinel-value approach has
operational downsides of its own.  There are pros and cons here, so it
basically boils down to a judgement call, and personally, I prefer the
alternative that's currently outlined in the doc.


Just thinking out loud here about whether there's another alternative --
this is a purely brainstorming-level idea, so please feel free to shoot
holes in it.  What if we had another type of SRV record specifically for
HTTP CONNECT proxy use?  The presence of that record would tell the client
to connect to that address and issue a CONNECT request using the originally
looked up name.  With that, case 3 would look something like this:

   1. client wants to connect to service.example.com
   2. do DNS SRV resolution for _grpclb._tcp.service.example.com; you find
   it is a LB with name lb.example.com
   3. do DNS SRV resolution for _grpc_proxy._tcp.lb.example.com; you find
   it is a proxy with name proxy.example.com
   4. do DNS lookup for proxy.example.com; get IP 1.2.3.4
   5. connect to proxy 1.2.3.4 with "CONNECT lb.example.com"; proxy does
   internal name resolution and connects to one of the load balancers
   6. send grpclb request; get response indicating that the backend server
   is 5.6.7.8
   7. ask the proxy mapper about IP 5.6.7.8; it recognizes it as an
   internal IP address and says to use "CONNECT 5.6.7.8" via proxy IP 1.2.3.4
   8. connect to proxy 1.2.3.4 with "CONNECT 5.6.7.8"; proxy connects to
   the specified backend server

In this case, there's no proxy mapper involved in the grpclb connection,
only for the backend connections, so the proxy mapper doesn't need to sync
up with the resolver result (which would seem to ameliorate your concern).
The down-sides are that there are more DNS lookups involved, and that we
might need to extend the resolver API so that it can pass down richer
information (not sure about that -- would need to think about this more
fully).

I'm not sure that this approach is really worth the additional complexity,
but I figured I'd shoot it out there and see what you think.  Thoughts...?

-- 
Mark D. Roth <r...@google.com>
Software Engineer
Google, Inc.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAJgPXp5wsYb36NtB6LiJWaFzBf2RFC15WPbUQi7%2BRmYMeAymbg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to