Re: [PATCH] FEATURE/MAJOR: Add upstream-proxy-tunnel feature

Aleksandar Lazic Thu, 19 Sep 2024 04:05:38 -0700

Hi Willy.

any chance to look into the Answer?


Regards

Alex

On 2024-08-15 (Do.) 10:55, Aleksandar Lazic wrote:

Hi Willy.

On 2024-08-12 (Mo.) 16:49, Willy Tarreau wrote:

Hi Alex,

On Mon, Aug 12, 2024 at 11:46:37AM +0200, Aleksandar Lazic wrote:

On Thu, Jun 13, 2024 at 03:00:59AM +0200, Aleksandar Lazic wrote:

The final idea is something like this.

```
tcp-request content upstream-proxy-header Host %[req.ssl_sni,lower]
tcp-request content upstream-proxy-header "$AUTH" "$TOKEN"
tcp-request content upstream-proxy-header Proxy-Connection Keep-Alive
tcp-request content upstream-proxy-target %[req.ssl_sni,lower]

server stream 0.0.0.0:0 upstream-proxy-tunnel
%[req.ssl_sni,lower,map_str(targets.map)]
```

The targets.map should have something like this.
#dest proxy
sni01 proxy01
sni02 proxy02

I hope the background of upstream-proxy-target is now more clear.

- In `server https_Via_Proxy1 0.0.0.0:0 upstream-proxy-tunnel127.0.0.1:3128` from your config, what is 0.0.0.0:0 used for here? This binds toall IPv4
     but on a random free port?

This is required when the destination should be dynamic, it's documentedhere.

http://docs.haproxy.org/3.0/configuration.html#4.4-do-resolve

A+
Dave


Regards
Alex


Overall I find that there are plenty of good points in this patch and
the discussion around it. But it also raises questions:

    - IMHO the server's address should definitely be the proxy's address.
      This will permit to use all the standard mechanisms we have such as
      DNS resolution and health checks and continues to work as a regular
      handshake. I'm not sure this is the case in the current state of the
      patch since I'm seeing an extra "upstream-proxy-tunnel" parameter
      (actually I'm confused by what you call a "target" here, as used in
      upstream-proxy-target).


Well I thought the same but that's then difficult to separate between the
proxy server and the destination server. Maybe we need some kind of server
type?

As you can see in the sequence diagram are both servers from HAProxy pointof view.

https://github.com/git001/tls-proxy-tunnel/blob/main/README.md#sequence-diagram

The "target" is the "destination server" which is from my point of view the
"server" in haproxy.

The "upstream-proxy-target" is the upstream proxy server. I'm also not happy
with that name and open for suggestions.


Then I'm confused by what the server's configured address becomes.


Fully understand.
Let me separate the word "server" here.

The server in the server line is the final target/destination server which isthe one which the user want to reach.

The upstream proxy server, similar to the socks4 server, is just a hop serverto connect to the final target/destination server.

I mean, there are three possibilities:

   1) the ip:port of the server designates the target server, in which
      case only ip:port are passed to the proxy and we never send an FQDN
      there. Then the proxy has to be hard-coded in an option (apparently
      the upstream-proxy-tunnel here). Haproxy then focuses only on the
      target server for load balancing, stickiness, health checks etc and
      always reaches that server through the hard-coded upstream proxy.
      This could match what an application-oriente LB would look for, i.e.
      being able to reach its "local" servers through a proxy. It's just
      that I would be surprised that such a use case really exists, because
      load-balancing remote servers across a single local hard-coded proxy
      doesn't feel natural at all. E.g. the LB is not even on the LAN that
      knows all the remote hosts so such a config is particularly difficult
      to maintain in a working state.

   2) the fqdn:port of the server designates the target server, in which
      case only fqdn:port are passed to the proxy and we let the proxy
      resolve the target server. The rest is the same as above, it's just
      a small variation, but I'm still having a hard time figuring what
      use case that could match.


This is the current implementation of the patch.

The Proxy is the one which then decides if the connection to thetarget/destination server is allowed or not.

   3) the ip:port of the server designates the upstream proxy, and the
      connection's destination is passed to the CONNECT method. In this
      case it means that LB, checks, stickiness etc applies to the proxies
      and not to the servers. In this case we need a distinct field to set
      the target host name. That's more or less what you did with the
      tcp-request upstream-proxy-target rule, which could possible have a
      default on the server line. We could also imagine that a set-dst
      action would allow to force the target destination address, but that
      would be a detail. This more is more about infrastructure than remote
      app servers: you tell the proxy how to reach external services. I've
      seen this requested a few times by those saying "I would like to use
      the proxy protocol from my servers, that would connect to haproxy
      which would then connect to the forward proxy to reach the
      destination". The choice might be debatable of course but I think it
      does cover a number of situations where haproxy and the forward proxy
      farm are here to provide access to a subset of the internet for a bunch
      of local servers.

Note that in all cases, the server's address is used and designates
something different. What's important is to stop thinking about the
single-node end-to-end chain because that's the best way to make a
mistake. We need to think that we're on a load balancer and what we
want to load-balance: remote servers or local proxies. I tend to think
the latter.


The big challenge here is to get the difference of a "server".

From my point of view is the upstream proxy server similar to the socks4server which is already available in HAProxy.

Similarly, in my mind it makes sense to think that if resolvers are
enabled on the server, they resolve to the IP of the forward proxy, so
that you can have, for example, forward proxies as part of your cloud
deployment, and learn about them dynamically.

I think we really need to know what use case we're trying to address,
it's very important to know it.

The use case is that a vendor have not implemented in there application aproxy handling and therefore is the only way to talk to the destination servervia a proxy a layer 4 pass-through. The issue here is that the Proxy requiressome information where to connect and this is done with the "CONNECTfqdn:port..." line.

    - you should definitely get rid of allthese DPRINTF() debug traces.
      The simple fact that I no longer know how to enable them should ring
      a bell about their relevance ;-)


:-)

What's the suggested alternative?
I thought "*TRACE*" but haven't seen any good documentation how to use it.
I'm happy to switch to any better debugging setup.


The traces are not trivial to set up but not terribly difficult either. They
work by subsystem which we call a "trace source". That's what you see in the
CLI's "trace" command (e.g.: h1, h2, stream, quic etc). This requires the
definition of a TRACE_SOURCE macro pointing to a local definition in the
file that is touched. This is entirely for a source file, so that traces are
easy to use. From there, TRACE_ENTER() and TRACE_LEAVE() in each function
are sufficient to know where you're entering/leaving. TRACE_PRINTF() allows
to perform printf-like traces when at devel level (they're a bit expensive
since using snprintf but that's OK here). Then there are other levels
depending on the level of info wanted (e.g. TRACE_USER for traces susceptible
to be of interest for the end user, such as "connected to X"), TRACE_STATE
for anything related to internal state changes (switching to connection
pending, then connected, then closed) etc.

We don't have traces at the connection level since this is mostly utility.
Maybe we could consider having specific traces for handshakes in general
(CONNECT, PP, etc), which could possibly help. But then it would mean
moving most of the handshake-specific code to a new file (e.g.
conn-handshake.c).

You can have a look at mux-h1.c or mux-h2.c to see an example of how these
are used.


Thank you for the explanation. I will take a look into it.

    - we need to make sure this handshake is sent before SSL and not after.
      This should permit to do what users normally expect, i.e., encrypt a
      TCP connection and encapsulate it into a CONNECT request.


I thought that's how I added it, looks like I have overseen something.
 From my understanding is the call in "xprt_handshake_io_cb(...)" before the
TLS/SSL handshake, is this wrong?


At first glance, seeing xprt_handshake_io_cb(), I think this is OK. But
it's important to test it. We could discover really nasty surprises.


Okay, let's see.

    - please don't realign the srv_kw_list at the end of the patch, it makes
      it impossible to spot the addition(s) there. We can tolerate a few
      unaligned entries (even though often it's a sign that a shorter name
      would be welcome), and if you really need to realign, please do that
      in a separate patch.


Okay. I'm happy to discuss a better name.


I don't have any for now (I'm not good at giving names, I think the
project itself shows it). However I think we should make "http1" appear
in the name so that there's no ambiguity about the compatibility with
other solutions (e.g. "use-http1-proxy" or "use-http1-connect" etc might
be explicit enough, though maybe looking at how other utilities call this
could help).

:-)

    - and BTW there's no need for this "upt" entry in the proxy struct, if
      a string is needed you can just use an ist or even char* which are
      simpler. But I do think that it should be extracted from the request
      (something we need to figure how by default).


In general, is the "struct proxy{ { ... }tcp_req; }" the right one to add
this feature?
Maybe it's better to add it into the l4 or l5 ruleset?


I still don't understand how it's going to be used at the proxy level.
I thought it would be a list of rules but actually that's not the case
since the rules are in tcp-request. I just don't understand where it is
configured.

I haven't seen any good way to define a server type like "upstream",
"origin", "any-other-type" for the server line maybe this type keyword could
help to get the separation between an upstream proxy server and an
origin/target/destination server.


Maybe. I was thinking that a way to distinguish between cases 1 and 3
above is actually the transparent mode: you want to route your connection
to the origin via the proxy. But there's still something that feels odd
in this, and that's also why I think we'd all see clearer after the real
world use cases are enumerated (regardless of the config, the config will
come out of these, trying to look the most intuitive possible to cover
these cases).


Thank you for all your time.

I think the best way to understand the use is to see the upstream similar tothe socks4 or proxy protocol stuff.I's just a precondition to connect to a destination server with some handshakebefore the connection.

Willy


Best regards
Alex

Re: [PATCH] FEATURE/MAJOR: Add upstream-proxy-tunnel feature

Reply via email to