Re: [grpc-io] Re: gRFC A1: HTTP CONNECT proxy support

'Mark D. Roth' via grpc.io Thu, 26 Jan 2017 12:46:30 -0800

On Thu, Jan 26, 2017 at 11:58 AM, Eric Anderson <ej...@google.com> wrote:

> On Thu, Jan 26, 2017 at 8:42 AM, 'Mark D. Roth' via grpc.io <
> grpc-io@googlegroups.com> wrote:
>>
>>    - *All* requests must go through the proxy, both for internal and
>>>>>> external servers.
>>>>>
>>>>>
>>>>> This is not true. It only applies to external servers. It directly
>>>>> contradicts the earlier "outbound to the internet." I could maybe agree
>>>>> with it if it said "may" instead of "must."
>>>>>
>>>>
>>>> My understanding is that if the http_proxy environment variable is set,
>>>> then the proxy is used unconditionally for all servers, so I think this is
>>>> accurate.  I've updated the wording in the description of this case to make
>>>> it clear that this is not just for outbound traffic.
>>>>
>>>
>>> That's conflating two things: the environment and the configuration.
>>> Your description of the environment is not true. When in this environment
>>> we expect the http_proxy environment variable as the form of configuration,
>>> but that has no impact on how the environment actually behaves.
>>>
>>
>> What we actually care about here is the configuration, which is that all
>> connections go through the proxy.
>>
>
> But... that's not what it says. It says, "We are aware of the following
> use-cases for TCP-level proxying with gRPC" and then follows with "A corp
> environment where all traffic (especially traffic outbound to the Internet)
> must go through a proxy." I'm not aware of that use-case/environment.
>
> As I said though, if you soften "must" to "may", I could get behind it.
> Otherwise I'm not aware of Case 1 existing at all in the world, so let's
> not support it.
>
> Really though, I'm not sure how often the proxies are unable to load
> internal resources. And even if they are able to, the solution isn't
> probably going to be satisfactory for users, because performance. If it
> were me I'd frame it where some application *only* needs to access
> external resources. http_proxy doesn't solve the mixed case, so let's just
> call that out up-front.
>
> but our code doesn't actually care about that; it just cares about what
>> configuration we need to support.
>>
>
> Then delete the use cases and just describe the configuration, if that's
> all that matters. That is to say, the use cases are important for people,
> not the code. And the document is for people.
>
> I'm harping on this a bit hard because many people *don't* already
> understand the use cases.
>

I've attempted to modify the language in the doc to make it clear that the
intent is for outbound traffic to go through the proxy, but that this is
often implemented by having *all* traffic go through the proxy.  Please let
me know if this addresses your concern.

>
> I'm not certain there's a fundamental need for special behavior between
>>> case 1 and 2 concerning the CONNECT string, but in any case, I don't see
>>> why the *proxy mapper* must do it.
>>>
>>
>> Can you say more about why you think we could use the same CONNECT
>> argument in both cases 1 and 2?
>>
>
> Use a string of what to connect to for CONNECT. Sometimes it contains an
> IP, sometimes it contains a hostname.
>

> Case 2 is triggered from the proxy mapper, which is why the proxy mapper
>> needs to use this mechanism.
>>
>
> I'm not concerned about it using any mechanism. I wanted to reduce its
> power, which I don't think there should be any argument about whether that
> is possible.
>
> In addition, this allows the proxy mapper to set the CONNECT argument
>> differently for the different situations in case 3.
>>
>
> And only case 3 benefits. There is no need application-provided overriding
> of the CONNECT string in cases 1 and 2. So that's why I was trying to
> figure out an alternative solution for case 3.
>

> And more generally, I also think it's a more flexible approach that may
>> allow users to write proxy mappers in the future to do things that we're
>> not thinking of right now.
>>
>
> "more flexible approach" != "better". Does it not concern you that the
> proxy mapper may *completely* replace the decision of the name resolver?
> That could make for a painful debugging session. I'm fine with it tweaking
> the results, but in no way do I see complete overriding to be a good thing
> inherently. If we have to do it, so be it, but it's an anti-feature if we
> support it unnecessarily.
>

Let me try to make sure I'm understanding you right here.  It sounds like
you're suggesting that instead of giving the proxy mapper the ability to
control whether the CONNECT argument is a hostname or an IP address, we
instead always assume that we should use the IP address in the CONNECT
request whenever use of a proxy is indicated by a proxy mapper.  In other
words, we would determine the CONNECT argument based on where the use of
the proxy was triggered (i.e., from the client channel code vs. from a
proxy mapper) instead of having the proxy mapper explicitly control it.  Is
that right?

I do understand where you're coming from with wanting to limit the proxy
mapper's control, but I am not actually bothered by allowing it to have
that control.  In general, I'd rather provide more flexibility where we can
and trust people to debug their own problems when they arise.  And it's not
as though anyone who has access to add a resolver does not also have access
to add a proxy mapper, so there's no security issue.

But that philosophical debate aside, I think that we should focus on case
3, because that's a concrete case that we do want to support.  So far, at
least, I have not heard a workable proposal that does not require the proxy
mapper to control the CONNECT argument (although I'm certainly still open
to new proposals).

I can think of one possible middle-ground approach here, which is that
instead of having the proxy mapper specify the CONNECT argument string, it
just indicates whether the argument should be the original hostname or the
IP address returned by the resolver.  That way, it can control what it
needs to but can't completely override the results of the resolver.  I'm
not super enthusiastic about this approach, since it seems like it actually
makes the interface a bit harder to understand, but I'm curious what you
think of it.

>
> I'd expect the proxy mapper to return one of two things:
>>>  - no proxy needed
>>>  - use CONNECT with proxy IP x.x.x.x
>>>
>>> That gives the mapper the control it needs without opening the ability
>>> to do outrageous things.
>>>
>>> I think "when it sees the proxy address" also has fundamental issues,
>>> like requiring the proxy to have a hard-coded stable IP. That means you
>>> couldn't add a new proxy to the rotation if experiencing too much load.
>>>
>>> More likely, in your scheme, I'd expect the "proxy address" to become
>>> 100% fake. "Oh! It's 1.1.1.1! That's our secret code for proxy address."
>>>
>>
>> We discussed the possibility of using a sentinel address value like this,
>> but I think that's really ugly.  Using the proxy address seems cleaner,
>> especially since the client needs to know what proxy address to use anyway
>> in order to return that value from the proxy mapper.
>>
>
> But you didn't address my concerns that the mapping code doesn't actually
> know the proxy addresses, unless it never changes. But if it never changes
> then you have stability issues.
>

The whole point of the proxy mapper is to provide a hook for the logic that
knows what proxy address to use.  It has to have some source of that data,
whether it be hard-coded or read from a file or something else entirely.

In other words, what I'm saying is that it's up to the proxy mapper's
author to decide how it will get this data, but that the author has to
solve that problem anyway as an inherent part of writing the proxy mapper.
Therefore, it does not add any additional requirement to use the proxy
address that it already has to know in its own logic.

-- 
Mark D. Roth <r...@google.com>
Software Engineer
Google, Inc.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAJgPXp7Lhx78vVuT886xW-BSXVaJiNjOzp4eOcmyirYn_xiTLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] Re: gRFC A1: HTTP CONNECT proxy support

Reply via email to