[PATCH] netfilter: xt_socket: fix transparent match for IPv6 request sockets

2016-09-23 Thread KOVACS Krisztian
The introduction of TCP_NEW_SYN_RECV state, and the addition of request
sockets to the ehash table seems to have broken the --transparent option
of the socket match for IPv6 (around commit a9407000).

Now that the socket lookup finds the TCP_NEW_SYN_RECV socket instead of the
listener, the --transparent option tries to match on the no_srccheck flag
of the request socket.

Unfortunately, that flag was only set for IPv4 sockets in tcp_v4_init_req()
by copying the transparent flag of the listener socket. This effectively
causes '-m socket --transparent' not match on the ACK packet sent by the
client in a TCP handshake.

Based on the suggestion from Eric Dumazet, this change moves the code
initializing no_srccheck to tcp_conn_request(), rendering the above
scenario working again.

Fixes: a94073 ("netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support")
Signed-off-by: Alex Badics 
Signed-off-by: KOVACS Krisztian 
---
 net/ipv4/tcp_input.c | 1 +
 net/ipv4/tcp_ipv4.c  | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3ebf45b..1fb2e82 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6260,6 +6260,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 
tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
tcp_openreq_init(req, &tmp_opt, skb, sk);
+   inet_rsk(req)->no_srccheck = inet_sk(sk)->transparent;
 
/* Note: tcp_v6_init_req() might override ir_iif for link locals */
inet_rsk(req)->ir_iif = inet_request_bound_dev_if(sk, skb);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7158d4f..b448eb9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1195,7 +1195,6 @@ static void tcp_v4_init_req(struct request_sock *req,
 
sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr);
sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr);
-   ireq->no_srccheck = inet_sk(sk_listener)->transparent;
ireq->opt = tcp_v4_save_options(skb);
 }
 
-- 
2.10.0



Re: [PATCH 6/6]: [CASSINI]: Bump driver version and release date.

2008-01-04 Thread KOVACS Krisztian
Hi,

On p, jan 04, 2008 at 12:35:12 -0800, David Miller wrote:
> [CASSINI]: Bump driver version and release date.
> 
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> ---
>  drivers/net/cassini.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
> index c3220e4..3a9bb17 100644
> --- a/drivers/net/cassini.c
> +++ b/drivers/net/cassini.c
> @@ -142,8 +142,8 @@
>  
>  #define DRV_MODULE_NAME  "cassini"
>  #define PFX DRV_MODULE_NAME  ": "
> -#define DRV_MODULE_VERSION   "1.4"
> -#define DRV_MODULE_RELDATE   "1 July 2004"
> +#define DRV_MODULE_VERSION   "1.5"
> +#define DRV_MODULE_RELDATE   "4 Jan 2007"

Erm, 2008?

>  #define CAS_DEF_MSG_ENABLE \
>   (NETIF_MSG_DRV  | \
> -- 
> 1.5.4.rc2.17.g257f
> 

-- 
KOVACS Krisztian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cassini driver skb->truesize bug

2007-12-05 Thread KOVACS Krisztian
Hi,

On Wed, Dec 05, 2007 at 11:23:46AM +0100, Balazs Scheidler wrote:
> Some more investigation revealed that the cassini driver leaks the data
> portion of all RXed packets, this makes the driver completely unusable.
> 
> We've tested the following combinations:
>  * 2.6.17 (patched, but no cassini related patches)
>  * 2.6.22 Ubuntu Gutsy.
> 
> It still worked in 2.6.12 where we originally backported the driver from
> 2.6.14.
> 
> The sk_buff count in slabinfo stays normal, so the skbs are properly
> freed. I'm suspicious about all this cas_page_t wrappers.

Commit fa4f0774d7c6cccb4d1fda76b91dd8eddcb2dd6a?

I don't really see how the buffer count for a page used as a fragment gets
decreased when the skb is freed.

-- 
KOVACS Krisztian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] Transparent Proxying Patches, Take 5

2007-10-14 Thread KOVACS Krisztian
Hi David,

On Sunday 14 October 2007, David Miller wrote:
> From: KOVACS Krisztian <[EMAIL PROTECTED]>
> Date: Sat, 13 Oct 2007 19:28:57 +0200
>
> > This is the fifth round of transparent proxying patches following
> > recent discussion on netfilter-devel [1,2].
> >
> > The aim of the patchset is to make non-locally bound sockets work
> > both for receiving and sending. The target is IPv4 TCP/UDP at the
> > moment.
>
> I appreciate the submission, but the 2.6.25 merge window is so far
> away that I'm personally not really going to look seriously into any
> non-trivial new work like this until we sort out all the regressions
> we've already added this week for the 2.6.24 merge window :-)

Sure, definitely makes sense. I'll resend these once things have settled 
down with 2.4.24.

-- 
 KOVACS Krisztian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/14] iptables tproxy core

2007-10-13 Thread KOVACS Krisztian
The iptables tproxy core is a module that contains the common routines used by
various tproxy related modules (TPROXY target and socket match)

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/netfilter/nf_tproxy_core.h |   32 +++
 net/netfilter/Kconfig  |   13 
 net/netfilter/Makefile |3 +
 net/netfilter/nf_tproxy_core.c |   96 
 4 files changed, 144 insertions(+), 0 deletions(-)

diff --git a/include/net/netfilter/nf_tproxy_core.h 
b/include/net/netfilter/nf_tproxy_core.h
new file mode 100644
index 000..2fac3ad
--- /dev/null
+++ b/include/net/netfilter/nf_tproxy_core.h
@@ -0,0 +1,32 @@
+#ifndef _NF_TPROXY_CORE_H
+#define _NF_TPROXY_CORE_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* look up and get a reference to a matching socket */
+extern struct sock *
+nf_tproxy_get_sock_v4(const u8 protocol,
+ const __be32 saddr, const __be32 daddr,
+ const __be16 sport, const __be16 dport,
+ const struct net_device *in, bool listening);
+
+static inline void
+nf_tproxy_put_sock(struct sock *sk)
+{
+   /* TIME_WAIT inet sockets have to be handled differently */
+   if ((sk->sk_protocol == IPPROTO_TCP) && (sk->sk_state == TCP_TIME_WAIT))
+   inet_twsk_put(inet_twsk(sk));
+   else
+   sock_put(sk);
+}
+
+/* assign a socket to the skb -- consumes sk */
+int
+nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk);
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index d7a600a..5bb4afb 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -257,6 +257,19 @@ config NF_CT_NETLINK
help
  This option enables support for a netlink-based userspace interface
 
+# transparent proxy support
+config NETFILTER_TPROXY
+   tristate "Transparent proxying support (EXPERIMENTAL)"
+   depends on EXPERIMENTAL && IP_NF_MANGLE
+   help
+ This option enables transparent proxying support, that is,
+ support for handling non-locally bound IPv4 TCP and UDP sockets.
+ For it to work you will have to configure certain iptables rules
+ and use policy routing. For more information on how to set it up
+ see Documentation/networking/tproxy.txt.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XTABLES
tristate "Netfilter Xtables support (required for ip_tables)"
help
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 93c58f9..5066297 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -34,6 +34,9 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o
 obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o
 obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
 
+# transparent proxy support
+obj-$(CONFIG_NETFILTER_TPROXY) += nf_tproxy_core.o
+
 # generic X tables 
 obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o
 
diff --git a/net/netfilter/nf_tproxy_core.c b/net/netfilter/nf_tproxy_core.c
new file mode 100644
index 000..1a25c61
--- /dev/null
+++ b/net/netfilter/nf_tproxy_core.c
@@ -0,0 +1,96 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct sock *
+nf_tproxy_get_sock_v4(const u8 protocol,
+ const __be32 saddr, const __be32 daddr,
+ const __be16 sport, const __be16 dport,
+ const struct net_device *in, bool listening_only)
+{
+   struct sock *sk;
+
+   /* look up socket */
+   switch (protocol) {
+   case IPPROTO_TCP:
+   if (listening_only)
+   sk = __inet_lookup_listener(&tcp_hashinfo,
+   daddr, ntohs(dport),
+   in->ifindex);
+   else
+   sk = __inet_lookup(&tcp_hashinfo,
+  saddr, sport, daddr, dport,
+  in->ifindex);
+   break;
+   case IPPROTO_UDP:
+   sk = udp4_lib_lookup(saddr, sport, daddr, dport,
+in->ifindex);
+   break;
+   default:
+   WARN_ON(1);
+   sk = NULL;
+   }
+
+   pr_debug("tproxy socket lookup: proto %u %08x:%u -> %08x:%u sock %p\n",
+protocol, ntohl(saddr), ntohs(sport)

[PATCH 07/14] Export UDP socket lookup function

2007-10-13 Thread KOVACS Krisztian
The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/udp.h |4 
 net/ipv4/udp.c|8 
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 98755eb..3efae7d 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -138,6 +138,10 @@ extern int udp_lib_setsockopt(struct sock *sk, int 
level, int optname,
   char __user *optval, int optlen,
   int (*push_pending_frames)(struct sock *));
 
+extern struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+   __be32 daddr, __be16 dport,
+   int dif);
+
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 /*
  * SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cb9fc58..053d5c4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -294,6 +294,14 @@ static struct sock *__udp4_lib_lookup(__be32 saddr, __be16 
sport,
return result;
 }
 
+struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+  __be32 daddr, __be16 dport,
+  int dif)
+{
+   return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup);
+
 static inline struct sock *udp_v4_mcast_next(struct sock *sk,
 __be16 loc_port, __be32 loc_addr,
 __be16 rmt_port, __be32 rmt_addr,

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/14] Split Netfilter IPv4 defragmentation into a separate module

2007-10-13 Thread KOVACS Krisztian
Netfilter connection tracking requires all IPv4 packets to be defragmented.
Both the socket match and the TPROXY target depend on this functionality, so
this patch separates the Netfilter IPv4 defrag hooks into a separate module.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/netfilter/ipv4/nf_defrag_ipv4.h|6 ++
 net/ipv4/netfilter/Kconfig |5 +
 net/ipv4/netfilter/Makefile|3 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   55 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c|   94 
 5 files changed, 110 insertions(+), 53 deletions(-)

diff --git a/include/net/netfilter/ipv4/nf_defrag_ipv4.h 
b/include/net/netfilter/ipv4/nf_defrag_ipv4.h
new file mode 100644
index 000..6b00ea3
--- /dev/null
+++ b/include/net/netfilter/ipv4/nf_defrag_ipv4.h
@@ -0,0 +1,6 @@
+#ifndef _NF_DEFRAG_IPV4_H
+#define _NF_DEFRAG_IPV4_H
+
+extern void nf_defrag_ipv4_enable(void);
+
+#endif /* _NF_DEFRAG_IPV4_H */
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index fa97947..c9108de 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -5,9 +5,14 @@
 menu "IP: Netfilter Configuration"
depends on INET && NETFILTER
 
+config NF_DEFRAG_IPV4
+   tristate
+   default n
+
 config NF_CONNTRACK_IPV4
tristate "IPv4 connection tracking support (required for NAT)"
depends on NF_CONNTRACK
+   select NF_DEFRAG_IPV4
---help---
  Connection tracking keeps a record of what packets have passed
  through your machine, in order to figure out how they are related
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 409d273..6504de5 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -18,6 +18,9 @@ obj-$(CONFIG_NF_CONNTRACK_IPV4) += nf_conntrack_ipv4.o
 
 obj-$(CONFIG_NF_NAT) += nf_nat.o
 
+# defrag
+obj-$(CONFIG_NF_DEFRAG_IPV4) += nf_defrag_ipv4.o
+
 # NAT helpers (nf_conntrack)
 obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o
 obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 
b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 2fcb924..cbc5b56 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int ipv4_pkt_to_tuple(const struct sk_buff *skb, unsigned int nhoff,
 struct nf_conntrack_tuple *tuple)
@@ -62,22 +63,6 @@ static int ipv4_print_conntrack(struct seq_file *s,
return 0;
 }
 
-/* Returns new sk_buff, or NULL */
-static struct sk_buff *
-nf_ct_ipv4_gather_frags(struct sk_buff *skb, u_int32_t user)
-{
-   skb_orphan(skb);
-
-   local_bh_disable();
-   skb = ip_defrag(skb, user);
-   local_bh_enable();
-
-   if (skb)
-   ip_send_check(ip_hdr(skb));
-
-   return skb;
-}
-
 static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
unsigned int *dataoff, u_int8_t *protonum)
 {
@@ -135,29 +120,6 @@ static unsigned int ipv4_conntrack_help(unsigned int 
hooknum,
ct, ctinfo);
 }
 
-static unsigned int ipv4_conntrack_defrag(unsigned int hooknum,
- struct sk_buff **pskb,
- const struct net_device *in,
- const struct net_device *out,
- int (*okfn)(struct sk_buff *))
-{
-   /* Previously seen (loopback)?  Ignore.  Do this before
-  fragment check. */
-   if ((*pskb)->nfct)
-   return NF_ACCEPT;
-
-   /* Gather fragments. */
-   if (ip_hdr(*pskb)->frag_off & htons(IP_MF | IP_OFFSET)) {
-   *pskb = nf_ct_ipv4_gather_frags(*pskb,
-   hooknum == NF_IP_PRE_ROUTING ?
-   IP_DEFRAG_CONNTRACK_IN :
-   IP_DEFRAG_CONNTRACK_OUT);
-   if (!*pskb)
-   return NF_STOLEN;
-   }
-   return NF_ACCEPT;
-}
-
 static unsigned int ipv4_conntrack_in(unsigned int hooknum,
  struct sk_buff **pskb,
  const struct net_device *in,
@@ -187,13 +149,6 @@ static unsigned int ipv4_conntrack_local(unsigned int 
hooknum,
make it the first hook. */
 static struct nf_hook_ops ipv4_conntrack_ops[] = {
{
-   .hook   = ipv4_conntrack_defrag,
-   .owner  = THIS_MODULE,
-   .pf = PF_INET,
-   .hooknum= NF_IP_PRE_ROUTING,
-   .priority   = NF_IP_PRI_CONNTRACK_DEFRAG,
-   },
-   {
   

[PATCH 06/14] Port redirection support for TCP

2007-10-13 Thread KOVACS Krisztian
Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/inet_sock.h |2 +-
 include/net/tcp.h   |1 +
 net/ipv4/inet_connection_sock.c |2 ++
 net/ipv4/syncookies.c   |1 +
 net/ipv4/tcp_output.c   |2 +-
 5 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 517efe7..d7e2a52 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -61,8 +61,8 @@ struct inet_request_sock {
struct request_sock req;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
u16 inet6_rsk_offset;
-   /* 2 bytes hole, try to pack */
 #endif
+   __be16  loc_port;
__be32  loc_addr;
__be32  rmt_addr;
__be16  rmt_port;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 92049e6..13bd06f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1006,6 +1006,7 @@ static inline void tcp_openreq_init(struct request_sock 
*req,
ireq->acked = 0;
ireq->ecn_ok = 0;
ireq->rmt_port = tcp_hdr(skb)->source;
+   ireq->loc_port = tcp_hdr(skb)->dest;
 }
 
 extern void tcp_enter_memory_pressure(void);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1667cd8..eda765f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -515,6 +515,8 @@ struct sock *inet_csk_clone(struct sock *sk, const struct 
request_sock *req,
newicsk->icsk_bind_hash = NULL;
 
inet_sk(newsk)->dport = inet_rsk(req)->rmt_port;
+   inet_sk(newsk)->num = ntohs(inet_rsk(req)->loc_port);
+   inet_sk(newsk)->sport = inet_rsk(req)->loc_port;
newsk->sk_write_space = sk_stream_write_space;
 
newicsk->icsk_retransmits = 0;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a0f6fdb..6e84243 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -223,6 +223,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb,
treq->rcv_isn   = ntohl(th->seq) - 1;
treq->snt_isn   = cookie;
req->mss= mss;
+   ireq->loc_port  = th->dest;
ireq->rmt_port  = th->source;
ireq->loc_addr  = ip_hdr(skb)->daddr;
ireq->rmt_addr  = ip_hdr(skb)->saddr;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 324b420..b27535f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2218,7 +2218,7 @@ struct sk_buff * tcp_make_synack(struct sock *sk, struct 
dst_entry *dst,
th->syn = 1;
th->ack = 1;
TCP_ECN_make_synack(req, th);
-   th->source = inet_sk(sk)->sport;
+   th->source = ireq->loc_port;
th->dest = ireq->rmt_port;
TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/14] Allow binding to non-local addresses if IP_TRANSPARENT is set

2007-10-13 Thread KOVACS Krisztian
Setting IP_TRANSPARENT is not really useful without allowing non-local
binds for the socket. To make user-space code simpler we allow these binds
even if IP_TRANSPARENT is set but IP_FREEBIND is not.

Signed-off-by: Tóth László Attila <[EMAIL PROTECTED]>
Acked-by: Patrick McHardy <[EMAIL PROTECTED]>
---

 net/ipv4/af_inet.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 621b128..4049a74 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -451,7 +451,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
 */
err = -EADDRNOTAVAIL;
if (!sysctl_ip_nonlocal_bind &&
-   !inet->freebind &&
+   !(inet->freebind || inet->transparent) &&
addr->sin_addr.s_addr != INADDR_ANY &&
chk_addr_ret != RTN_LOCAL &&
chk_addr_ret != RTN_MULTICAST &&

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/14] Conditionally enable transparent flow flag when connecting

2007-10-13 Thread KOVACS Krisztian
Set FLOWI_FLAG_ANYSRC in flowi->flags if the socket has the
transparent socket option set. This way we selectively enable certain
connections with non-local source addresses to be routed.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/route.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 88fed3c..9788cc2 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -158,6 +158,10 @@ static inline int ip_route_connect(struct rtable **rp, 
__be32 dst,
 .dport = dport } } };
 
int err;
+
+   if (inet_sk(sk)->transparent)
+   fl.flags |= FLOWI_FLAG_ANYSRC;
+
if (!dst || !src) {
err = __ip_route_output_key(rp, &fl);
if (err)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/14] Handle TCP SYN+ACK/ACK/RST transparency

2007-10-13 Thread KOVACS Krisztian
The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
incoming packets. The non-local source address check on output bites
us again, as replies for transparently redirected traffic won't have a
chance to leave the node.

This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing
the route lookup for those replies. Transparent replies are enabled if
the listening socket has the transparent socket flag set.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/inet_sock.h |8 +++-
 include/net/ip.h|9 +
 net/ipv4/inet_connection_sock.c |1 +
 net/ipv4/ip_output.c|4 +++-
 net/ipv4/syncookies.c   |1 +
 net/ipv4/tcp_ipv4.c |   11 ---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index e86832d..517efe7 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -72,7 +72,8 @@ struct inet_request_sock {
sack_ok: 1,
wscale_ok  : 1,
ecn_ok : 1,
-   acked  : 1;
+   acked  : 1,
+   no_srccheck: 1;
struct ip_options   *opt;
 };
 
@@ -191,4 +192,9 @@ static inline int inet_sk_ehashfn(const struct sock *sk)
return inet_ehashfn(laddr, lport, faddr, fport);
 }
 
+static inline __u8 inet_sk_flowi_flags(const struct sock *sk)
+{
+   return inet_sk(sk)->transparent ? FLOWI_FLAG_ANYSRC : 0;
+}
+
 #endif /* _INET_SOCK_H */
diff --git a/include/net/ip.h b/include/net/ip.h
index 3af3ed9..5ea3813 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -29,6 +29,7 @@
 
 #include 
 #include 
+#include 
 
 struct sock;
 
@@ -140,12 +141,20 @@ static inline void ip_tr_mc_map(__be32 addr, char *buf)
 
 struct ip_reply_arg {
struct kvec iov[1];   
+   int flags;
__wsum  csum;
int csumoffset; /* u16 offset of csum in iov[0].iov_base */
/* -1 if not needed */ 
int bound_dev_if;
 }; 
 
+#define IP_REPLY_ARG_NOSRCCHECK 1
+
+static inline __u8 ip_reply_arg_flowi_flags(const struct ip_reply_arg *arg)
+{
+   return (arg->flags & IP_REPLY_ARG_NOSRCCHECK) ? FLOWI_FLAG_ANYSRC : 0;
+}
+
 void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg 
*arg,
   unsigned int len); 
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 3cef128..1667cd8 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -335,6 +335,7 @@ struct dst_entry* inet_csk_route_req(struct sock *sk,
.saddr = ireq->loc_addr,
.tos = RT_CONN_FLAGS(sk) } },
.proto = sk->sk_protocol,
+   .flags = inet_sk_flowi_flags(sk),
.uli_u = { .ports =
   { .sport = inet_sk(sk)->sport,
 .dport = ireq->rmt_port } } };
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 699f067..62b31ae 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -322,6 +322,7 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
.saddr = inet->saddr,
.tos = 
RT_CONN_FLAGS(sk) } },
.proto = sk->sk_protocol,
+   .flags = inet_sk_flowi_flags(sk),
.uli_u = { .ports =
   { .sport = inet->sport,
 .dport = inet->dport } 
} };
@@ -1368,7 +1369,8 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, 
struct ip_reply_arg *ar
.uli_u = { .ports =
   { .sport = tcp_hdr(skb)->dest,
 .dport = tcp_hdr(skb)->source 
} },
-   .proto = sk->sk_protocol };
+   .proto = sk->sk_protocol,
+   .flags = ip_reply_arg_flowi_flags(arg) };
security_skb_classify_flow(skb, &fl);
if (ip_route_output_key(&rt, &fl))
return;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 2da1be0..a0f6fdb 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -260,6 +260,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb,
  

[PATCH 02/14] Implement IP_TRANSPARENT socket option

2007-10-13 Thread KOVACS Krisztian
This patch introduces the IP_TRANSPARENT socket option: enabling that will make
the IPv4 routing omit the non-local source address check on output. Setting
IP_TRANSPARENT requires NET_ADMIN capability.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
Acked-by: Patrick McHardy <[EMAIL PROTECTED]>
---

 include/linux/in.h   |1 +
 include/net/inet_sock.h  |3 ++-
 include/net/inet_timewait_sock.h |3 ++-
 include/net/route.h  |1 +
 net/ipv4/inet_timewait_sock.c|1 +
 net/ipv4/ip_sockglue.c   |   12 +++-
 6 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/in.h b/include/linux/in.h
index 3975cbf..d8c55ab 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -75,6 +75,7 @@ struct in_addr {
 #define IP_IPSEC_POLICY16
 #define IP_XFRM_POLICY 17
 #define IP_PASSSEC 18
+#define IP_TRANSPARENT 19
 
 /* BSD compatibility */
 #define IP_RECVRETOPTS IP_RETOPTS
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 62daf21..e86832d 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -128,7 +128,8 @@ struct inet_sock {
is_icsk:1,
freebind:1,
hdrincl:1,
-   mc_loop:1;
+   mc_loop:1,
+   transparent:1;
int mc_index;
__be32  mc_addr;
struct ip_mc_socklist   *mc_list;
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index abaff05..6cf717f 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -127,7 +127,8 @@ struct inet_timewait_sock {
__be16  tw_dport;
__u16   tw_num;
/* And these are ours. */
-   __u8tw_ipv6only:1;
+   __u8tw_ipv6only:1,
+   tw_transparent:1;
/* 15 bits hole, try to pack */
__u16   tw_ipv6_offset;
int tw_timeout;
diff --git a/include/net/route.h b/include/net/route.h
index f7ce625..88fed3c 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 4e189e2..9e74c8d 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -107,6 +107,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct 
sock *sk, const int stat
tw->tw_reuse= sk->sk_reuse;
tw->tw_hash = sk->sk_hash;
tw->tw_ipv6only = 0;
+   tw->tw_transparent  = inet->transparent;
tw->tw_prot = sk->sk_prot_creator;
atomic_set(&tw->tw_refcnt, 1);
inet_twsk_dead_node_init(tw);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index f51f20e..f750620 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -420,7 +420,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 (1<= sizeof(int)) {
@@ -885,6 +885,16 @@ static int do_ip_setsockopt(struct sock *sk, int level,
err = xfrm_user_policy(sk, optname, optval, optlen);
break;
 
+   case IP_TRANSPARENT:
+   if (!capable(CAP_NET_ADMIN)) {
+   err = -EPERM;
+   break;
+   }
+   if (optlen < 1)
+   goto e_inval;
+   inet->transparent = !!val;
+   break;
+
default:
err = -ENOPROTOOPT;
break;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/14] Transparent Proxying Patches, Take 5

2007-10-13 Thread KOVACS Krisztian
Hi Dave,

This is the fifth round of transparent proxying patches following
recent discussion on netfilter-devel [1,2].

The aim of the patchset is to make non-locally bound sockets work both
for receiving and sending. The target is IPv4 TCP/UDP at the moment.

Speaking of the patches, there are two big parts:

 * Output path (patches 1-6): these modifications make it possible to
   send IPv4 datagrams with non-local source IP address by:

   - Introducing a new flowi flag (FLOWI_FLAG_ANYSRC) which disables
 source address checking in ip_route_output_slow(). This is
 also necessary for some of the tricks LVS does. [3]

   - Adding the IP_TRANSPARENT socket option (setting this requires
 CAP_NET_ADMIN to prevent source address spoofing).

   - Gluing these together across the TCP/UDP code.

 * Input path (patches 7-13): these changes add redirection support
   for TCP along with an iptables target implementing NAT-less traffic
   interception, and an iptables match to make ahead-of-time socket
   lookups on PREROUTING. These combined with a set of iptables rules
   and policy routing make non-locally bound sockets work.

   - Netfilter IPv4 defragmentation is split into a separate
 module. It's not particularly pretty but I see no other way of
 making sure the 'socket' match gets no fragmented IPv4 packets.

   - The 'socket' iptables match does a socket lookup on the
 destination address and matches if a socket was found.

   - The 'TPROXY' iptables target provides a way to intercept traffic
 without NAT -- it does an ahead-of-time socket lookup on the
 configured address and caches the socket reference in the skb.

   - IPv4 TCP and UDP input path is modified to use this stored socket
 reference if it's present.

The last patch adds a short intro on how to use it. A trivial patch
for netcat demonstrating the necessary modifications for proxies is
available separately at [4].


References:
[1] http://marc.info/?l=netfilter-devel&m=119118672703285&w=2
[2] http://marc.info/?l=netfilter-devel&m=119135774918622&w=2
[3] http://marc.info/?l=linux-netdev&m=118065358510836&w=2
[4] 
http://people.netfilter.org/hidden/tproxy/netcat-ip_transparent-support.patch

-- 
KOVACS Krisztian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/14] Loosen source address check on IPv4 output

2007-10-13 Thread KOVACS Krisztian
ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.

Signed-off-by: Julian Anastasov <[EMAIL PROTECTED]>
Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
Acked-by: Patrick McHardy <[EMAIL PROTECTED]>
---

 include/net/flow.h |1 +
 net/ipv4/route.c   |   20 +---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index af59fa5..c734d50 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -49,6 +49,7 @@ struct flowi {
__u8proto;
__u8flags;
 #define FLOWI_FLAG_MULTIPATHOLDROUTE 0x01
+#define FLOWI_FLAG_ANYSRC 0x02
union {
struct {
__be16  sport;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 21b12de..6f7e4cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2155,11 +2155,6 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
ZERONET(oldflp->fl4_src))
goto out;
 
-   /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-   dev_out = ip_dev_find(oldflp->fl4_src);
-   if (dev_out == NULL)
-   goto out;
-
/* I removed check for oif == dev_out->oif here.
   It was wrong for two reasons:
   1. ip_dev_find(saddr) can return wrong iface, if saddr is
@@ -2170,6 +2165,11 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
 
if (oldflp->oif == 0
&& (MULTICAST(oldflp->fl4_dst) || oldflp->fl4_dst == 
htonl(0x))) {
+   /* It is equivalent to inet_addr_type(saddr) == 
RTN_LOCAL */
+   dev_out = ip_dev_find(oldflp->fl4_src);
+   if (dev_out == NULL)
+   goto out;
+
/* Special hack: user can direct multicasts
   and limited broadcast via necessary interface
   without fiddling with IP_MULTICAST_IF or IP_PKTINFO.
@@ -2188,9 +2188,15 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
fl.oif = dev_out->ifindex;
goto make_route;
}
-   if (dev_out)
+
+   if (!(oldflp->flags & FLOWI_FLAG_ANYSRC)) {
+   /* It is equivalent to inet_addr_type(saddr) == 
RTN_LOCAL */
+   dev_out = ip_dev_find(oldflp->fl4_src);
+   if (dev_out == NULL)
+   goto out;
dev_put(dev_out);
-   dev_out = NULL;
+   dev_out = NULL;
+   }
}
 
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] Don't lookup the socket if there's a socket attached to the skb

2007-10-13 Thread KOVACS Krisztian
Use the socket cached in the TPROXY target if it's present.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 net/ipv4/udp.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 053d5c4..6592689 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1158,6 +1158,14 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct 
hlist_head udptable[],
if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, 
udptable);
 
+#if defined(CONFIG_NETFILTER_TPROXY) || defined(CONFIG_NETFILTER_TPROXY_MODULE)
+   if (unlikely(skb->sk)) {
+   /* steal reference */
+   sk = skb->sk;
+   skb->destructor = NULL;
+   skb->sk = NULL;
+   } else
+#endif
sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
   skb->dev->ifindex, udptable);
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/14] Add documentation

2007-10-13 Thread KOVACS Krisztian
Add basic usage instructions to Documentation/networking.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 Documentation/networking/tproxy.txt |   62 +++
 1 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/tproxy.txt 
b/Documentation/networking/tproxy.txt
new file mode 100644
index 000..dfcb613
--- /dev/null
+++ b/Documentation/networking/tproxy.txt
@@ -0,0 +1,62 @@
+Transparent proxy support
+=
+
+This feature adds Linux 2.2-like transparent proxy support to current kernels.
+To use it, enable NETFILTER_TPROXY, the socket match and the TPROXY target in
+your kernel config. You will need policy routing too, so be sure to enable that
+as well.
+
+1. Making non-local sockets work
+
+
+The idea is that you identify packets with destination address matching a local
+socket your box, set the packet mark to a certain value, and then match on that
+value using policy routing to have those packets delivered locally:
+
+# iptables -t mangle -N DIVERT
+# iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
+# iptables -t mangle -A DIVERT -j MARK --set-mark 1
+# iptables -t mangle -A DIVERT -j ACCEPT
+
+# ip rule add fwmark 1 lookup 100
+# ip route add local 0.0.0.0/0 dev lo table 100
+
+Because of certain restrictions in the IPv4 routing output code you'll have to
+modify your application to allow it sending datagrams _from_ non-local IP
+addresses. All you have to do is to enable the (SOL_IP, IP_TRANSPARENT) socket
+option before calling bind:
+
+fd = socket(AF_INET, SOCK_STREAM, 0);
+/* - 8< -*/
+int value = 1;
+setsockopt(fd, SOL_IP, IP_TRANSPARENT, &value, sizeof(value));
+/* - 8< -*/
+name.sin_family = AF_INET;
+name.sin_port = htons(0xCAFE);
+name.sin_addr.s_addr = htonl(0xDEADBEEF);
+bind(fd, &name, sizeof(name));
+
+A trivial patch for netcat is available here:
+http://people.netfilter.org/hidden/tproxy/netcat-ip_transparent-support.patch
+
+
+2. Redirecting traffic
+==
+
+Transparent proxying often involves "intercepting" traffic on a router. This is
+usually done with the iptables REDIRECT target, however, there are serious
+limitations of that method. One of the major issues is that it actually
+modifies the packets to change the destination address -- which might not be
+acceptable in certain situations. (Think of proxying UDP for example: you won't
+be able to find out the original destination address. Even in case of TCP
+getting the original destination address is racy.)
+
+The 'TPROXY' target provides similar functionality without relying on NAT. 
Simply
+add rules like this to the iptables ruleset above:
+
+# iptables -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY \
+  --tproxy-mark 0x1/0x1 --on-port 50080
+
+Note that for this to work you'll have to modify the proxy to enable (SOL_IP,
+IP_TRANSPARENT) for the listening socket.
+

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] iptables TPROXY target

2007-10-13 Thread KOVACS Krisztian
The TPROXY target implements redirection of non-local TCP/UDP traffic to local
sockets. Additionally, it's possible to manipulate the packet mark if and only
if a socket has been found. (We need this because we cannot use multiple
targets in the same iptables rule.)

Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]>
Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/linux/netfilter/xt_TPROXY.h |   14 
 net/netfilter/Kconfig   |   14 
 net/netfilter/Makefile  |1 
 net/netfilter/xt_TPROXY.c   |  113 +++
 4 files changed, 142 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter/xt_TPROXY.h 
b/include/linux/netfilter/xt_TPROXY.h
new file mode 100644
index 000..152e8f9
--- /dev/null
+++ b/include/linux/netfilter/xt_TPROXY.h
@@ -0,0 +1,14 @@
+#ifndef _XT_TPROXY_H_target
+#define _XT_TPROXY_H_target
+
+/* TPROXY target is capable of marking the packet to perform
+ * redirection. We can get rid of that whenever we get support for
+ * mutliple targets in the same rule. */
+struct xt_tproxy_target_info {
+   u_int32_t mark_mask;
+   u_int32_t mark_value;
+   __be32 laddr;
+   __be16 lport;
+};
+
+#endif /* _XT_TPROXY_H_target */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 47976b5..c80f08a 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -366,6 +366,20 @@ config NETFILTER_XT_TARGET_NOTRACK
  If you want to compile it as a module, say M here and read
  .  If unsure, say `N'.
 
+config NETFILTER_XT_TARGET_TPROXY
+   tristate '"TPROXY" target support (EXPERIMENTAL)'
+   depends on EXPERIMENTAL
+   depends on NETFILTER_TPROXY
+   depends on NETFILTER_XTABLES
+   select NF_DEFRAG_IPV4
+   help
+ This option adds a `TPROXY' target, which is somewhat similar to
+ REDIRECT.  It can only be used in the mangle table and is useful
+ to redirect traffic to a transparent proxy.  It does _not_ depend
+ on Netfilter connection tracking and NAT, unlike REDIRECT.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_TRACE
tristate  '"TRACE" target support'
depends on NETFILTER_XTABLES
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 2303ef3..4af92fe 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NOTRACK) += xt_NOTRACK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_SECMARK) += xt_SECMARK.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_TPROXY) += xt_TPROXY.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
new file mode 100644
index 000..9222a8f
--- /dev/null
+++ b/net/netfilter/xt_TPROXY.c
@@ -0,0 +1,113 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+static unsigned int
+xt_tproxy_target(struct sk_buff **pskb,
+const struct net_device *in,
+const struct net_device *out,
+unsigned int hooknum,
+const struct xt_target *target,
+const void *targinfo)
+{
+   const struct iphdr *iph = ip_hdr(*pskb);
+   const struct xt_tproxy_target_info *tgi = targinfo;
+   struct sk_buff *skb = *pskb;
+   struct udphdr _hdr, *hp;
+   struct sock *sk;
+
+   hp = skb_header_pointer(*pskb, ip_hdrlen(skb), sizeof(_hdr), &_hdr);
+   if (hp == NULL)
+   return NF_DROP;
+
+   sk = nf_tproxy_get_sock_v4(iph->protocol,
+  iph->saddr, tgi->laddr ? tgi->laddr : 
iph->daddr,
+  hp->source, tgi->lport ? tgi->lport : 
hp->dest,
+  in, true);
+
+   /* NOTE: assign_sock consumes our sk reference */
+   if (sk && nf_tproxy_assign_sock(skb, sk)) {
+   /* This should be in a separate target, but we don't do multiple
+  targets on the same rule yet */
+   skb->mark = (skb->mark & ~tgi->mark_mask) ^ tgi->mark_value;
+
+   pr_debug("redirecting: proto %u %08x:%u -> %08x:%u, mark: %x\n",
+iph->

[PATCH 12/14] Don't lookup the socket if there's a socket attached to the skb

2007-10-13 Thread KOVACS Krisztian
Use the socket cached in the TPROXY target if it's present.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 net/ipv4/tcp_ipv4.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fb471b0..90ee2ca 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1662,6 +1662,14 @@ int tcp_v4_rcv(struct sk_buff *skb)
TCP_SKB_CB(skb)->flags   = iph->tos;
TCP_SKB_CB(skb)->sacked  = 0;
 
+#if defined(CONFIG_NETFILTER_TPROXY) || defined(CONFIG_NETFILTER_TPROXY_MODULE)
+   if (unlikely(skb->sk)) {
+   /* steal reference */
+   sk = skb->sk;
+   skb->destructor = NULL;
+   skb->sk = NULL;
+   } else
+#endif
sk = __inet_lookup(&tcp_hashinfo, iph->saddr, th->source,
   iph->daddr, th->dest, inet_iif(skb));
if (!sk)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/14] iptables socket match

2007-10-13 Thread KOVACS Krisztian
Add iptables 'socket' match, which matches packets for which a TCP/UDP
socket lookup succeeds.

Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]>
Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 net/netfilter/Kconfig |   14 ++
 net/netfilter/Makefile|1 
 net/netfilter/xt_socket.c |   99 +
 3 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 5bb4afb..47976b5 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -635,6 +635,20 @@ config NETFILTER_XT_MATCH_SCTP
  If you want to compile it as a module, say M here and read
  .  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_SOCKET
+   tristate '"socket" match support (EXPERIMENTAL)'
+   depends on EXPERIMENTAL
+   depends on NETFILTER_TPROXY
+   depends on NETFILTER_XTABLES
+   select NF_DEFRAG_IPV4
+   help
+ This option adds a `socket' match, which can be used to match
+ packets for which a TCP or UDP socket lookup finds a valid socket.
+ It can be used in combination with the MARK target and policy
+ routing to implement full featured non-locally bound sockets.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_STATE
tristate '"state" match support'
depends on NETFILTER_XTABLES
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 5066297..2303ef3 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -73,6 +73,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_SOCKET) += xt_socket.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_STATE) += xt_state.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_STATISTIC) += xt_statistic.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_STRING) += xt_string.o
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
new file mode 100644
index 000..f2e0846
--- /dev/null
+++ b/net/netfilter/xt_socket.c
@@ -0,0 +1,99 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (C) 2007 BalaBit IT Ltd.
+ * Author: Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static bool
+xt_socket_match(const struct sk_buff *skb,
+const struct net_device *in,
+const struct net_device *out,
+const struct xt_match *match,
+const void *matchinfo,
+int offset,
+unsigned int protoff,
+bool *hotdrop)
+{
+   const struct iphdr *iph = ip_hdr(skb);
+   struct udphdr _hdr, *hp;
+   struct sock *sk;
+
+   hp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_hdr), &_hdr);
+   if (hp == NULL)
+   return false;
+
+   sk = nf_tproxy_get_sock_v4(iph->protocol,
+  iph->saddr, iph->daddr,
+  hp->source, hp->dest, in, false);
+   if (sk != NULL)
+   nf_tproxy_put_sock(sk);
+
+   pr_debug("socket match: proto %u %08x:%u -> %08x:%u sock %p\n",
+iph->protocol, ntohl(iph->saddr), ntohs(hp->source),
+ntohl(iph->daddr), ntohs(hp->dest), sk);
+
+   return (sk != NULL);
+}
+
+static bool
+xt_socket_checkentry(const char *tablename,
+const void *entry,
+const struct xt_match *match,
+void *matchinfo,
+unsigned int hook_mask)
+{
+   const struct ipt_ip *i = entry;
+
+   if ((i->proto == IPPROTO_TCP || i->proto == IPPROTO_UDP)
+   && !(i->invflags & IPT_INV_PROTO))
+   return true;
+
+   pr_info("xt_socket: Can be used only in combination with "
+   "either -p tcp or -p udp\n");
+   return false;
+}
+
+static struct xt_match xt_socket_reg __read_mostly = {
+   .name   = "socket",
+   .family = AF_INET,
+   .match  = xt_socket_match,
+   .checkentry = xt_socket_checkentry,
+   .hooks  = (1 << NF_IP_PRE_ROUTING),
+   .me = THIS_MODULE,
+};
+
+static int __init xt_socket_init(void)
+{
+   nf_defrag_ipv4_enable();
+   return xt_register_match(&xt_socket_reg);
+}
+
+static void __exit xt_socket_fini(void)
+{
+   xt_unregister_match(&xt_socket_reg);
+}
+
+module_init(xt_socket_in

Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-06-01 Thread KOVACS Krisztian

  Hi,

On Friday 01 June 2007 01:18, Julian Anastasov wrote:
>   What about something like this, it even reduces checks
> in the fast path. You can post new version if the following change
> looks good to you and to other developers. If additional sign line is
> needed here it is:
>
> Signed-off-by: Julian Anastasov <[EMAIL PROTECTED]>
>
>[...]
>   Or we can go further and to avoid ip_dev_find? For me, this
> second variant is preferred because calling ip_dev_find() is useless
> for FLOWI_FLAG_ANYSRC.

  You're right. Although I don't really like duplicating the ip_dev_find()
call, it's still better than the previous patch.

-- 
 Regards,
  Krisztian Kovacs


Loosen source address check on IPv4 output

ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
Signed-off-by: Julian Anastasov <[EMAIL PROTECTED]>
---

 include/net/flow.h |1 +
 net/ipv4/route.c   |   20 +---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index f3cc1f8..1bfc0dc 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -49,6 +49,7 @@ struct flowi {
__u8proto;
__u8flags;
 #define FLOWI_FLAG_MULTIPATHOLDROUTE 0x01
+#define FLOWI_FLAG_ANYSRC 0x02
union {
struct {
__be16  sport;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8603cfb..4acd3de 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2394,11 +2394,6 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
ZERONET(oldflp->fl4_src))
goto out;
 
-   /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-   dev_out = ip_dev_find(oldflp->fl4_src);
-   if (dev_out == NULL)
-   goto out;
-
/* I removed check for oif == dev_out->oif here.
   It was wrong for two reasons:
   1. ip_dev_find(saddr) can return wrong iface, if saddr is
@@ -2409,6 +2404,11 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
 
if (oldflp->oif == 0
&& (MULTICAST(oldflp->fl4_dst) || oldflp->fl4_dst == 
htonl(0x))) {
+   /* It is equivalent to inet_addr_type(saddr) == 
RTN_LOCAL */
+   dev_out = ip_dev_find(oldflp->fl4_src);
+   if (dev_out == NULL)
+   goto out;
+
/* Special hack: user can direct multicasts
   and limited broadcast via necessary interface
   without fiddling with IP_MULTICAST_IF or IP_PKTINFO.
@@ -2427,9 +2427,15 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
fl.oif = dev_out->ifindex;
goto make_route;
}
-   if (dev_out)
+
+   if (!(oldflp->flags & FLOWI_FLAG_ANYSRC)) {
+   /* It is equivalent to inet_addr_type(saddr) == 
RTN_LOCAL */
+   dev_out = ip_dev_find(oldflp->fl4_src);
+   if (dev_out == NULL)
+   goto out;
dev_put(dev_out);
-   dev_out = NULL;
+   dev_out = NULL;
+   }
}
 
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-31 Thread KOVACS Krisztian

  Hi,

On Thursday 31 May 2007 02:21, Julian Anastasov wrote:
> >   I've posted a few patches making omitting this check possible
> > selectively back in March. Do those changes look acceptable?
> >
> >   http://marc.info/?l=linux-netdev&m=117310979823297&w=3
>   Also, i'm not sure if FLOWI_FLAG_TRANSPARENT should cause
> different values for flags to be cached many times. Users without this
> flag get EINVAL when fl4_src is not configured, other failures are not
> cached too. And as fl4_src is considered in both cases (both kinds of
> callers get same path on success) we don't need changes except in
> ip_route_output_slow()? By this way I hope we can avoid any possible
> forking of cache entries just by different flags.

  Indeed, for output it probably does not matter, I've removed the flags
check from the flow index compare routine.

>   Then we can use some more generic name, only for the flowi flag,
> eg. FLOWI_FLAG_ANYSRC or something better?

  You're right, _TRANSPARENT was a bad idea. I'm not very good at
choosing names.

  So what about this one?



Loosen source address check on IPv4 output

From: KOVACS Krisztian <[EMAIL PROTECTED]>

ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
---

 include/net/flow.h |1 +
 net/ipv4/route.c   |   47 +--
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index f3cc1f8..1bfc0dc 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -49,6 +49,7 @@ struct flowi {
__u8proto;
__u8flags;
 #define FLOWI_FLAG_MULTIPATHOLDROUTE 0x01
+#define FLOWI_FLAG_ANYSRC 0x02
union {
struct {
__be16  sport;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8603cfb..88d0a79 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2396,7 +2396,7 @@ static int ip_route_output_slow(struct rtable **rp, const 
struct flowi *oldflp)
 
/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
dev_out = ip_dev_find(oldflp->fl4_src);
-   if (dev_out == NULL)
+   if (dev_out == NULL && !(oldflp->flags & FLOWI_FLAG_ANYSRC))
goto out;
 
/* I removed check for oif == dev_out->oif here.
@@ -2407,29 +2407,32 @@ static int ip_route_output_slow(struct rtable **rp, 
const struct flowi *oldflp)
  of another iface. --ANK
 */
 
-   if (oldflp->oif == 0
-   && (MULTICAST(oldflp->fl4_dst) || oldflp->fl4_dst == 
htonl(0x))) {
-   /* Special hack: user can direct multicasts
-  and limited broadcast via necessary interface
-  without fiddling with IP_MULTICAST_IF or IP_PKTINFO.
-  This hack is not just for fun, it allows
-  vic,vat and friends to work.
-  They bind socket to loopback, set ttl to zero
-  and expect that it will work.
-  From the viewpoint of routing cache they are broken,
-  because we are not allowed to build multicast path
-  with loopback source addr (look, routing cache
-  cannot know, that ttl is zero, so that packet
-  will not leave this host and route is valid).
-  Luckily, this hack is good workaround.
-*/
+   if (dev_out) {
+   if (oldflp->oif == 0
+   && (MULTICAST(oldflp->fl4_dst)
+   || oldflp->fl4_dst == htonl(0x))) {
+   /* Special hack: user can direct multicasts
+  and limited broadcast via necessary interface
+  without fiddling with IP_MULTICAST_IF or 
IP_PKTINFO.
+  This hack is not just for fun, it allows
+  vic,vat and friends to work.
+  They bind socket to loopback, set ttl to zero
+  and expect that it will work.
+  From the viewpoint of routing cache they are 
broken,
+  because we are not allowed to build 
multicast 

Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-30 Thread KOVACS Krisztian

  Hi,

On Friday 18 May 2007 11:05, David Miller wrote:
> From: Julian Anastasov <[EMAIL PROTECTED]>
> Date: Fri, 18 May 2007 11:40:54 +0300 (EEST)
>
> > On Thu, 17 May 2007, Patrick McHardy wrote:
> > > In any case some better solution than the current one needs to be
> > > found, allowing users to send spoofed packets is far worse than
> > > using a non-desired source address for ICMP packets.
> >
> > yes, I would prefer the sysctl_ip_nonlocal_bind change to be
> > removed until such solution is found.
>
> Ok, I'll revert it.

  I'm just about to publish the next round of tproxy patches (with the 
routing code modifications completely removed), but this issue is still 
present.

  I've posted a few patches making omitting this check possible 
selectively back in March. Do those changes look acceptable?

  http://marc.info/?l=linux-netdev&m=117310979823297&w=3

  And the related socket layer changes:

  http://marc.info/?l=linux-netdev&m=117310979815374&w=3
  http://marc.info/?l=linux-netdev&m=117310979902806&w=3
  http://marc.info/?l=linux-netdev&m=117310980027541&w=3

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 13/13] iptables tproxy match

2007-03-05 Thread KOVACS Krisztian
Implements an iptables module which matches packets which have the
tproxy flag set, that is, packets diverted in the tproxy table.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/netfilter/Kconfig |9 +
 net/netfilter/Makefile|1 +
 net/netfilter/xt_tproxy.c |   77 +
 3 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 253fce3..b22346e 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -603,6 +603,15 @@ config NETFILTER_XT_MATCH_QUOTA
  If you want to compile it as a module, say M here and read
  .  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_TPROXY
+   tristate '"tproxy" match support'
+   depends on NETFILTER_XTABLES
+   help
+ This option adds a `tproxy' match, which allows you to match
+ packets which have been diverted to local sockets by TProxy.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_REALM
tristate  '"realm" match support'
depends on NETFILTER_XTABLES
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index b2b5c75..83b2fd9 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) += xt_mark.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_TPROXY) += xt_tproxy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o
diff --git a/net/netfilter/xt_tproxy.c b/net/netfilter/xt_tproxy.c
new file mode 100644
index 000..53f8bee
--- /dev/null
+++ b/net/netfilter/xt_tproxy.c
@@ -0,0 +1,77 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2007 BalaBit IT Ltd.
+ * Author: Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+
+#include 
+
+static int
+match(const struct sk_buff *skb,
+  const struct net_device *in,
+  const struct net_device *out,
+  const struct xt_match *match,
+  const void *matchinfo,
+  int offset,
+  unsigned int protoff,
+  int *hotdrop)
+{
+   return skb->ip_tproxy;
+}
+
+static int
+check(const char *tablename,
+  const void *entry,
+  const struct xt_match *match,
+  void *matchinfo,
+  unsigned int hook_mask)
+{
+   return 1;
+}
+
+static struct xt_match tproxy_matches[] = {
+   {
+   .name   = "tproxy",
+   .match  = match,
+   .matchsize  = 0,
+   .checkentry = check,
+   .family = AF_INET,
+   .me = THIS_MODULE,
+   },
+   {
+   .name   = "tproxy",
+   .match  = match,
+   .matchsize  = 0,
+   .checkentry = check,
+   .family = AF_INET6,
+   .me = THIS_MODULE,
+   },
+};
+
+static int __init xt_tproxy_init(void)
+{
+   return xt_register_matches(tproxy_matches, ARRAY_SIZE(tproxy_matches));
+}
+
+static void __exit xt_tproxy_fini(void)
+{
+   xt_unregister_matches(tproxy_matches, ARRAY_SIZE(tproxy_matches));
+}
+
+module_init(xt_tproxy_init);
+module_exit(xt_tproxy_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Krisztian Kovacs <[EMAIL PROTECTED]>");
+MODULE_DESCRIPTION("iptables tproxy match module");
+MODULE_ALIAS("ipt_tproxy");
+MODULE_ALIAS("ip6t_tproxy");

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 12/13] iptables TPROXY target

2007-03-05 Thread KOVACS Krisztian
The TPROXY target implements redirection of non-local TCP/UDP traffic
to local sockets. It is simply a wrapper around functionality exported
from iptable_tproxy.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/netfilter_ipv4/ipt_TPROXY.h |9 +++
 net/ipv4/netfilter/Kconfig|   11 +++
 net/ipv4/netfilter/Makefile   |1 
 net/ipv4/netfilter/ipt_TPROXY.c   |   92 +
 4 files changed, 113 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter_ipv4/ipt_TPROXY.h 
b/include/linux/netfilter_ipv4/ipt_TPROXY.h
new file mode 100644
index 000..d05c956
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ipt_TPROXY.h
@@ -0,0 +1,9 @@
+#ifndef _IPT_TPROXY_H_target
+#define _IPT_TPROXY_H_target
+
+struct ipt_tproxy_target_info {
+   u_int16_t lport;
+   u_int32_t laddr;
+};
+
+#endif
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 17c3ec8..ecd8da5 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -638,6 +638,17 @@ config IP_NF_TPROXY
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_TARGET_TPROXY
+   tristate "TPROXY target support"
+   depends on IP_NF_TPROXY
+   help
+ This option adds a `TPROXY' target, which is somewhat similar to
+ REDIRECT.  It can only be used in the tproxy table and is useful
+ to redirect traffic to a transparent proxy.  It does _not_ depend
+ on Netfilter connection tracking.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 # ARP tables
 config IP_NF_ARPTABLES
tristate "ARP tables support"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 21a29f4..a50a64e 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_IP_NF_TARGET_LOG) += ipt_LOG.o
 obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_ULOG.o
 obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o
 obj-$(CONFIG_IP_NF_TARGET_TTL) += ipt_TTL.o
+obj-$(CONFIG_IP_NF_TARGET_TPROXY) += ipt_TPROXY.o
 
 # generic ARP tables
 obj-$(CONFIG_IP_NF_ARPTABLES) += arp_tables.o
diff --git a/net/ipv4/netfilter/ipt_TPROXY.c b/net/ipv4/netfilter/ipt_TPROXY.c
new file mode 100644
index 000..89a08b1
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_TPROXY.c
@@ -0,0 +1,92 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static unsigned int
+target(struct sk_buff **pskb,
+   const struct net_device *in,
+   const struct net_device *out,
+   unsigned int hooknum,
+   const struct xt_target *target,
+   const void *targinfo)
+{
+   const struct iphdr *iph = (*pskb)->nh.iph;
+   const struct ipt_tproxy_target_info *tgi =
+   (const struct ipt_tproxy_target_info *) targinfo;
+   unsigned int verdict = NF_ACCEPT;
+   struct sk_buff *skb = *pskb;
+   struct udphdr _hdr, *hp;
+   struct sock *sk;
+   __be32 daddr;
+   __be16 dport;
+
+   /* TCP/UDP only */
+   if ((iph->protocol != IPPROTO_TCP) &&
+   (iph->protocol != IPPROTO_UDP))
+   return NF_ACCEPT;
+
+   hp = skb_header_pointer(*pskb, iph->ihl * 4, sizeof(_hdr), &_hdr);
+   if (hp == NULL)
+   return NF_DROP;
+
+   daddr = tgi->laddr ? : iph->daddr;
+   dport = tgi->lport ? : hp->dest;
+   sk = ip_tproxy_get_sock(iph->protocol,
+   iph->saddr, daddr,
+   hp->source, dport, in);
+   if (sk != NULL) {
+   if (ip_tproxy_do_divert(skb, sk, 0, in) < 0)
+   verdict = NF_DROP;
+
+   if ((iph->protocol == IPPROTO_TCP) && (sk->sk_state == 
TCP_TIME_WAIT))
+   inet_twsk_put(inet_twsk(sk));
+   else
+   sock_put(sk);
+   }
+
+   return verdict;
+}
+
+static struct xt_target ipt_tproxy_reg = {
+   .name   = "TPROXY",
+   .family = AF_INET,
+   .target = target,
+   .targetsize = sizeof(struct ipt_tproxy_target_info),
+   .table  = "tproxy",
+   .me = THIS_MODULE,
+};
+
+static int __init init(void)
+{
+   return xt_register_target(&ipt_tproxy_reg);
+}
+
+static void __exit fini(void)
+{
+   xt_unregister_target(&ipt_tproxy_reg);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("

[PATCH/RFC 11/13] iptables tproxy table

2007-03-05 Thread KOVACS Krisztian
The iptables tproxy table registers a new hook on PRE_ROUTING and for
each incoming TCP/UDP packet performs as follows:

1. Does IPv4 fragment reassembly. We need this to be able to do TCP/UDP
   header processing.

2. Does a TCP/UDP socket hash lookup to decide whether or not the packet
   is sent to a non-local bound socket. If a matching socket is found
   and the socket has the IP_TRANSPARENT socket option enabled the skb is
   diverted locally and the socket reference is stored in the skb.

3. If no matching socket was found, the PREROUTING chain of the
   iptables tproxy table is consulted. Matching rules with the TPROXY
   target can do transparent redirection here. (In this case it is not
   necessary to have the IP_TRANSPARENT socket option enabled for the
   target socket, redirection takes place even for "regular"
   sockets. This way no modification of the application is necessary.)

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/netfilter_ipv4.h   |1 
 include/linux/netfilter_ipv4/ip_tproxy.h |   20 ++
 include/net/ip.h |3 
 net/ipv4/netfilter/Kconfig   |   10 +
 net/ipv4/netfilter/Makefile  |1 
 net/ipv4/netfilter/iptable_tproxy.c  |  267 ++
 6 files changed, 301 insertions(+), 1 deletions(-)

diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h
index ceae87a..cc4d83b 100644
--- a/include/linux/netfilter_ipv4.h
+++ b/include/linux/netfilter_ipv4.h
@@ -58,6 +58,7 @@ enum nf_ip_hook_priorities {
NF_IP_PRI_SELINUX_FIRST = -225,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
+   NF_IP_PRI_TPROXY = -125,
NF_IP_PRI_NAT_DST = -100,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
diff --git a/include/linux/netfilter_ipv4/ip_tproxy.h 
b/include/linux/netfilter_ipv4/ip_tproxy.h
new file mode 100644
index 000..ae890e3
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ip_tproxy.h
@@ -0,0 +1,20 @@
+#ifndef _IP_TPROXY_H
+#define _IP_TPROXY_H
+
+#include 
+
+/* look up and get a reference to a matching socket */
+extern struct sock *
+ip_tproxy_get_sock(const u8 protocol,
+  const __be32 saddr, const __be32 daddr,
+  const __be16 sport, const __be16 dport,
+  const struct net_device *in);
+
+/* divert skb to a given socket */
+extern int
+ip_tproxy_do_divert(struct sk_buff *skb,
+   const struct sock *sk,
+   const int require_freebind,
+   const struct net_device *in);
+
+#endif
diff --git a/include/net/ip.h b/include/net/ip.h
index 8b71991..a589e6e 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -321,7 +321,8 @@ enum ip_defrag_users
IP_DEFRAG_CONNTRACK_OUT,
IP_DEFRAG_VS_IN,
IP_DEFRAG_VS_OUT,
-   IP_DEFRAG_VS_FWD
+   IP_DEFRAG_VS_FWD,
+   IP_DEFRAG_TP_IN,
 };
 
 struct sk_buff *ip_defrag(struct sk_buff *skb, u32 user);
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 601808c..17c3ec8 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -628,6 +628,16 @@ config IP_NF_RAW
  If you want to compile it as a module, say M here and read
  .  If unsure, say `N'.
 
+# tproxy table
+config IP_NF_TPROXY
+   tristate "Transparent proxying"
+   depends on IP_NF_IPTABLES
+   help
+ Transparent proxying. For more information see
+ http://www.balabit.com/downloads/tproxy.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 # ARP tables
 config IP_NF_ARPTABLES
tristate "ARP tables support"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 6625ec6..21a29f4 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_IP_NF_MANGLE) += iptable_mangle.o
 obj-$(CONFIG_IP_NF_NAT) += iptable_nat.o
 obj-$(CONFIG_NF_NAT) += iptable_nat.o
 obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o
+obj-$(CONFIG_IP_NF_TPROXY) += iptable_tproxy.o
 
 # matches
 obj-$(CONFIG_IP_NF_MATCH_IPRANGE) += ipt_iprange.o
diff --git a/net/ipv4/netfilter/iptable_tproxy.c 
b/net/ipv4/netfilter/iptable_tproxy.c
new file mode 100644
index 000..a241f11
--- /dev/null
+++ b/net/ipv4/netfilter/iptable_tproxy.c
@@ -0,0 +1,267 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define TPROXY_VALID_HOOKS (1 << NF_IP_PRE_ROUTING)

[PATCH/RFC 09/13] Create a tproxy flag in struct sk_buff

2007-03-05 Thread KOVACS Krisztian
We would like to be able to match on whether or not a given packet has
been diverted by tproxy. To make this possible we need a flag in
sk_buff.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/skbuff.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..6d7f5c7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -284,7 +284,8 @@ struct sk_buff {
nfctinfo:3;
__u8pkt_type:3,
fclone:2,
-   ipvs_property:1;
+   ipvs_property:1,
+   ip_tproxy:1;
__be16  protocol;
 
void(*destructor)(struct sk_buff *skb);

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 10/13] Export UDP socket lookup function

2007-03-05 Thread KOVACS Krisztian
The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/udp.h |4 
 net/ipv4/udp.c|8 
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 1b921fa..ea5aa31 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -141,6 +141,10 @@ extern int udp_lib_setsockopt(struct sock *sk, int 
level, int optname,
   char __user *optval, int optlen,
   int (*push_pending_frames)(struct sock *));
 
+extern struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+   __be32 daddr, __be16 dport,
+   int dif);
+
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 /*
  * SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1d15edc..52695a6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -285,6 +285,14 @@ static struct sock *__udp4_lib_lookup(__be32 saddr, __be16 
sport,
return result;
 }
 
+struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+__be32 daddr, __be16 dport,
+int dif)
+{
+   return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup);
+
 static inline struct sock *udp_v4_mcast_next(struct sock *sk,
 __be16 loc_port, __be32 loc_addr,
 __be16 rmt_port, __be32 rmt_addr,

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 08/13] Handle TCP SYN+ACK/ACK/RST transparency

2007-03-05 Thread KOVACS Krisztian
The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
incoming packets. The non-local source address check on output bites
us again, as replies for transparently redirected traffic won't have a
chance to leave the node.

This patch selectively sets the FLOWI_FLAG_TRANSPARENT flag when doing
the route lookup for those replies. Transparent replies are enabled if
the listening socket has the transparent socket flag set.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/ip.h|3 +++
 include/net/request_sock.h  |3 ++-
 net/ipv4/inet_connection_sock.c |2 ++
 net/ipv4/ip_output.c|6 +-
 net/ipv4/syncookies.c   |2 ++
 net/ipv4/tcp_ipv4.c |   16 ++--
 net/ipv4/tcp_minisocks.c|3 ++-
 7 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index e79c3e3..8b71991 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -133,8 +133,11 @@ static inline void ip_tr_mc_map(__be32 addr, char *buf)
buf[5]=0x00;
 }
 
+#define IP_REPLY_ARG_NOSRCCHECK 1
+
 struct ip_reply_arg {
struct kvec iov[1];   
+   int flags;
__wsum  csum;
int csumoffset; /* u16 offset of csum in iov[0].iov_base */
/* -1 if not needed */ 
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 7aed02c..b9c8974 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -34,7 +34,8 @@ struct request_sock_ops {
   struct request_sock *req,
   struct dst_entry *dst);
void(*send_ack)(struct sk_buff *skb,
-   struct request_sock *req);
+   struct request_sock *req,
+   int reply_flags);
void(*send_reset)(struct sock *sk,
  struct sk_buff *skb);
void(*destructor)(struct request_sock *req);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 83ad972..90459a1 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -323,6 +323,8 @@ struct dst_entry* inet_csk_route_req(struct sock *sk,
.saddr = ireq->loc_addr,
.tos = RT_CONN_FLAGS(sk) } },
.proto = sk->sk_protocol,
+   .flags = inet_sk(sk)->transparent ?
+   FLOWI_FLAG_TRANSPARENT : 0,
.uli_u = { .ports =
   { .sport = inet_sk(sk)->sport,
 .dport = ireq->rmt_port } } };
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d096332..7af25d4 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -312,6 +312,8 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
.saddr = inet->saddr,
.tos = 
RT_CONN_FLAGS(sk) } },
.proto = sk->sk_protocol,
+   .flags = inet->transparent ?
+FLOWI_FLAG_TRANSPARENT 
: 0,
.uli_u = { .ports =
   { .sport = inet->sport,
 .dport = inet->dport } 
} };
@@ -1357,7 +1359,9 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, 
struct ip_reply_arg *ar
.uli_u = { .ports =
   { .sport = skb->h.th->dest,
 .dport = skb->h.th->source } },
-   .proto = sk->sk_protocol };
+   .proto = sk->sk_protocol,
+   .flags = (arg->flags & 
IP_REPLY_ARG_NOSRCCHECK) ?
+   FLOWI_FLAG_TRANSPARENT : 0 };
security_skb_classify_flow(skb, &fl);
if (ip_route_output_key(&rt, &fl))
return;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 431c81d..08d8920 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -261,6 +261,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb,
.saddr = ireq->loc_addr,
.tos = RT_CONN_FLAGS(sk) } },
.proto = I

[PATCH/RFC 07/13] Conditionally enable transparent flow flag when connecting

2007-03-05 Thread KOVACS Krisztian
Set FLOWI_FLAG_TRANSPARENT in flowi->flags if the socket has the
transparent socket option set. This way we selectively enable certain
connections with non-local source addresses to be routed.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/route.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 13da592..4dff368 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -161,6 +161,10 @@ static inline int ip_route_connect(struct rtable **rp, 
__be32 dst,
 .dport = dport } } };
 
int err;
+
+   if (inet_sk(sk)->transparent)
+   fl.flags |= FLOWI_FLAG_TRANSPARENT;
+
if (!dst || !src) {
err = __ip_route_output_key(rp, &fl);
if (err)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 06/13] Implement IP_TRANSPARENT socket option

2007-03-05 Thread KOVACS Krisztian
This patch introduces the IP_TRANSPARENT socket option: enabling that will make
the IPv4 routing omit the non-local source address check on output. Setting
IP_TRANSPARENT requires NET_ADMIN capability.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/in.h   |1 +
 include/net/inet_sock.h  |3 ++-
 include/net/inet_timewait_sock.h |3 ++-
 include/net/route.h  |1 +
 net/ipv4/inet_timewait_sock.c|1 +
 net/ipv4/ip_sockglue.c   |   12 +++-
 6 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/in.h b/include/linux/in.h
index 1912e7c..66be615 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -75,6 +75,7 @@ struct in_addr {
 #define IP_IPSEC_POLICY16
 #define IP_XFRM_POLICY 17
 #define IP_PASSSEC 18
+#define IP_TRANSPARENT 19
 
 /* BSD compatibility */
 #define IP_RECVRETOPTS IP_RETOPTS
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 0bd167b..14b597d 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -128,7 +128,8 @@ struct inet_sock {
is_icsk:1,
freebind:1,
hdrincl:1,
-   mc_loop:1;
+   mc_loop:1,
+   transparent:1;
int mc_index;
__be32  mc_addr;
struct ip_mc_socklist   *mc_list;
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index f7be1ac..e30dd61 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -126,7 +126,8 @@ struct inet_timewait_sock {
__be16  tw_dport;
__u16   tw_num;
/* And these are ours. */
-   __u8tw_ipv6only:1;
+   __u8tw_ipv6only:1,
+   tw_transparent:1;
/* 15 bits hole, try to pack */
__u16   tw_ipv6_offset;
int tw_timeout;
diff --git a/include/net/route.h b/include/net/route.h
index efaa6b2..13da592 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index a73cf93..f57f81a 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -108,6 +108,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct 
sock *sk, const int stat
tw->tw_reuse= sk->sk_reuse;
tw->tw_hash = sk->sk_hash;
tw->tw_ipv6only = 0;
+   tw->tw_transparent  = inet->transparent;
tw->tw_prot = sk->sk_prot_creator;
atomic_set(&tw->tw_refcnt, 1);
inet_twsk_dead_node_init(tw);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 23048d9..02e8d9f 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -414,7 +414,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
(1<= sizeof(int)) {
@@ -875,6 +875,16 @@ mc_msf_out:
err = xfrm_user_policy(sk, optname, optval, optlen);
break;
 
+   case IP_TRANSPARENT:
+   if (!capable(CAP_NET_ADMIN)) {
+   err = -EPERM;
+   break;
+   }
+   if (optlen < 1)
+   goto e_inval;
+   inet->transparent = !!val;
+   break;
+
default:
err = -ENOPROTOOPT;
break;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 04/13] Don't do the UDP socket lookup if we already have one attached

2007-03-05 Thread KOVACS Krisztian
UDP input code path looks up the UDP socket hash tables to find a
socket matching the incoming packet. However, as iptable_tproxy does
socket lookups early the skb may already have the appropriate
reference attached, in that case we steal that reference instead of
doing the lookup.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/ipv4/udp.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ce6c460..1d15edc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1226,8 +1226,15 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct 
hlist_head udptable[],
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, 
udptable);
 
-   sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
-  skb->dev->ifindex, udptable);
+   if (skb->sk) {
+   /* steal reference */
+   sk = skb->sk;
+   skb->destructor = NULL;
+   skb->sk = NULL;
+   } else {
+   sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+  skb->dev->ifindex, udptable);
+   }
 
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 05/13] Loosen source address check on IPv4 output

2007-03-05 Thread KOVACS Krisztian
ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/flow.h |1 +
 net/ipv4/route.c   |8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index ce4b10d..9eb91f2 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -49,6 +49,7 @@ struct flowi {
__u8proto;
__u8flags;
 #define FLOWI_FLAG_MULTIPATHOLDROUTE 0x01
+#define FLOWI_FLAG_TRANSPARENT 0x02
union {
struct {
__be16  sport;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c526fb2..8091a96 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -572,7 +572,8 @@ static inline int compare_keys(struct flowi *fl1, struct 
flowi *fl2)
(*(u16 *)&fl1->nl_u.ip4_u.tos ^
 *(u16 *)&fl2->nl_u.ip4_u.tos) |
(fl1->oif ^ fl2->oif) |
-   (fl1->iif ^ fl2->iif)) == 0;
+   (fl1->iif ^ fl2->iif) |
+   ((fl1->flags ^ fl2->flags) & FLOWI_FLAG_TRANSPARENT)) == 0;
 }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
@@ -2338,6 +2339,7 @@ static inline int __mkroute_output(struct rtable **result,
rth->fl.fl4_src = oldflp->fl4_src;
rth->fl.oif = oldflp->oif;
rth->fl.mark= oldflp->mark;
+   rth->fl.flags   = oldflp->flags;
rth->rt_dst = fl->fl4_dst;
rth->rt_src = fl->fl4_src;
rth->rt_iif = oldflp->oif ? : dev_out->ifindex;
@@ -2482,6 +2484,7 @@ static int ip_route_output_slow(struct rtable **rp, const 
struct flowi *oldflp)
  RT_SCOPE_LINK :
  RT_SCOPE_UNIVERSE),
  } },
+   .flags = oldflp->flags,
.mark = oldflp->mark,
.iif = loopback_dev.ifindex,
.oif = oldflp->oif };
@@ -2506,7 +2509,7 @@ static int ip_route_output_slow(struct rtable **rp, const 
struct flowi *oldflp)
 
/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
dev_out = ip_dev_find(oldflp->fl4_src);
-   if (dev_out == NULL)
+   if (dev_out == NULL && !(oldflp->flags & 
FLOWI_FLAG_TRANSPARENT))
goto out;
 
/* I removed check for oif == dev_out->oif here.
@@ -2678,6 +2681,7 @@ int __ip_route_output_key(struct rtable **rp, const 
struct flowi *flp)
rth->fl.iif == 0 &&
rth->fl.oif == flp->oif &&
rth->fl.mark == flp->mark &&
+   !((rth->fl.flags ^ flp->flags) & FLOWI_FLAG_TRANSPARENT) &&
!((rth->fl.fl4_tos ^ flp->fl4_tos) &
(IPTOS_RT_MASK | RTO_ONLINK))) {
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 03/13] Don't do the TCP socket lookup if we already have one attached

2007-03-05 Thread KOVACS Krisztian
TCP input code path looks up the TCP socket hash tables to find a
socket matching the incoming packet. However, as iptable_tproxy does
socket lookups early the skb may already have the appropriate
reference attached, in that case we steal that reference instead of
doing the lookup.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/ipv4/tcp_ipv4.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ba74bb..536db7b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1647,9 +1647,16 @@ int tcp_v4_rcv(struct sk_buff *skb)
TCP_SKB_CB(skb)->flags   = skb->nh.iph->tos;
TCP_SKB_CB(skb)->sacked  = 0;
 
-   sk = __inet_lookup(&tcp_hashinfo, skb->nh.iph->saddr, th->source,
-  skb->nh.iph->daddr, th->dest,
-  inet_iif(skb));
+   if (unlikely(skb->sk)) {
+   /* steal reference */
+   sk = skb->sk;
+   skb->destructor = NULL;
+   skb->sk = NULL;
+   } else {
+   sk = __inet_lookup(&tcp_hashinfo, skb->nh.iph->saddr, 
th->source,
+  skb->nh.iph->daddr, th->dest,
+  inet_iif(skb));
+   }
 
if (!sk)
goto no_tcp_socket;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 02/13] Port redirection support for TCP

2007-03-05 Thread KOVACS Krisztian
Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/inet_sock.h |1 +
 include/net/tcp.h   |1 +
 net/ipv4/inet_connection_sock.c |2 ++
 net/ipv4/syncookies.c   |1 +
 net/ipv4/tcp_output.c   |2 +-
 5 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ce6da97..0bd167b 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -64,6 +64,7 @@ struct inet_request_sock {
 #endif
__be32  loc_addr;
__be32  rmt_addr;
+   __be16  loc_port;
__be16  rmt_port;
u16 snd_wscale : 4, 
rcv_wscale : 4, 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5c472f2..e1cb3d0 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -982,6 +982,7 @@ static inline void tcp_openreq_init(struct request_sock 
*req,
ireq->acked = 0;
ireq->ecn_ok = 0;
ireq->rmt_port = skb->h.th->source;
+   ireq->loc_port = skb->h.th->dest;
 }
 
 extern void tcp_enter_memory_pressure(void);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 43fb160..83ad972 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -502,6 +502,8 @@ struct sock *inet_csk_clone(struct sock *sk, const struct 
request_sock *req,
newicsk->icsk_bind_hash = NULL;
 
inet_sk(newsk)->dport = inet_rsk(req)->rmt_port;
+   inet_sk(newsk)->num = ntohs(inet_rsk(req)->loc_port);
+   inet_sk(newsk)->sport = inet_rsk(req)->loc_port;
newsk->sk_write_space = sk_stream_write_space;
 
newicsk->icsk_retransmits = 0;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 33016cc..431c81d 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -223,6 +223,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb,
treq->rcv_isn   = ntohl(skb->h.th->seq) - 1;
treq->snt_isn   = cookie;
req->mss= mss;
+   ireq->loc_port  = skb->h.th->dest;
ireq->rmt_port  = skb->h.th->source;
ireq->loc_addr  = skb->nh.iph->daddr;
ireq->rmt_addr  = skb->nh.iph->saddr;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..a3ea7a1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2135,7 +2135,7 @@ struct sk_buff * tcp_make_synack(struct sock *sk, struct 
dst_entry *dst,
th->syn = 1;
th->ack = 1;
TCP_ECN_make_synack(req, th);
-   th->source = inet_sk(sk)->sport;
+   th->source = ireq->loc_port;
th->dest = ireq->rmt_port;
TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 01/13] Implement local diversion of IPv4 skbs

2007-03-05 Thread KOVACS Krisztian
The input path for non-local bound sockets requires diverting certain
packets locally, even if their destination IP address is not
considered local. We achieve this by assigning a specially crafted dst
entry to these skbs, and optionally also attaching a socket to the skb
so that the upper layer code does not need to redo the socket lookup.

We also have to be able to differentiate between these fake entries
and "real" entries in the cache: it is perfectly legal that the
diversion is done only for certain TCP or UDP packets and not for all
packets of the flow. Since these special dst entries are used only by
the iptables tproxy code, and that code uses exclusively these
entries, simply flagging these entries as DST_DIVERTED is OK. All
other cache lookup paths skip diverted entries, while our new
ip_divert_local() function uses exclusively diverted dst entries.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/dst.h   |1 
 include/net/route.h |2 +
 net/ipv4/route.c|  113 +++
 3 files changed, 115 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index e12a8ce..4cd0745 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -48,6 +48,7 @@ struct dst_entry
 #define DST_NOPOLICY   4
 #define DST_NOHASH 8
 #define DST_BALANCED0x10
+#define DST_DIVERTED   0x20
unsigned long   expires;
 
unsigned short  header_len; /* more space at head required 
*/
diff --git a/include/net/route.h b/include/net/route.h
index 749e4df..efaa6b2 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -125,6 +125,8 @@ extern int  ip_rt_ioctl(unsigned int cmd, void 
__user *arg);
 extern voidip_rt_get_source(u8 *src, struct rtable *rt);
 extern int ip_rt_dump(struct sk_buff *skb,  struct 
netlink_callback *cb);
 
+extern int ip_divert_local(struct sk_buff *skb, const struct 
in_device *in, struct sock *sk);
+
 struct in_ifaddr;
 extern void fib_add_ifaddr(struct in_ifaddr *);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 37e0d4d..c526fb2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -100,6 +100,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -941,9 +942,11 @@ restart:
while ((rth = *rthp) != NULL) {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
if (!(rth->u.dst.flags & DST_BALANCED) &&
+   !((rt->u.dst.flags ^ rth->u.dst.flags) & DST_DIVERTED) &&
compare_keys(&rth->fl, &rt->fl)) {
 #else
-   if (compare_keys(&rth->fl, &rt->fl)) {
+   if (!((rt->u.dst.flags ^ rth->u.dst.flags) & DST_DIVERTED) &&
+   compare_keys(&rth->fl, &rt->fl)) {
 #endif
/* Put it first */
*rthp = rth->u.dst.rt_next;
@@ -1165,6 +1168,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 
new_gw,
if (rth->fl.fl4_dst != daddr ||
rth->fl.fl4_src != skeys[i] ||
rth->fl.oif != ikeys[k] ||
+   (rth->u.dst.flags & DST_DIVERTED) ||
rth->fl.iif != 0) {
rthp = &rth->u.dst.rt_next;
continue;
@@ -1525,6 +1529,111 @@ static int ip_rt_bug(struct sk_buff *skb)
return 0;
 }
 
+static void ip_divert_free_sock(struct sk_buff *skb)
+{
+   struct sock *sk = skb->sk;
+
+   skb->sk = NULL;
+   skb->destructor = NULL;
+
+   if (sk) {
+   /* TIME_WAIT inet sockets have to be handled differently */
+   if (((sk->sk_protocol == IPPROTO_TCP) && (sk->sk_state == 
TCP_TIME_WAIT)) ||
+   ((sk->sk_protocol == IPPROTO_DCCP) && (sk->sk_state == 
DCCP_TIME_WAIT)))
+   inet_twsk_put(inet_twsk(sk));
+   else
+   sock_put(sk);
+   }
+}
+
+int ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct 
sock *sk)
+{
+   struct iphdr *iph = skb->nh.iph;
+   struct rtable *rth, *rtres;
+   unsigned hash;
+   const int iif = in->dev->ifindex;
+   u_int8_t tos;
+   int err;
+
+   /* look up hash first */
+   tos = iph->tos & IPTOS_RT_MASK;
+   hash = rt_hash_code(iph->daddr, iph->saddr ^ (iif << 5));
+
+   rcu_read_lock();
+   for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
+rth = rcu_dereference(rth->u.dst.rt_next)) {
+   if (rth->fl.fl4_dst == iph->daddr &&
+   rth-

[PATCH/RFC 00/13] Transparent proxying patches, take two

2007-03-05 Thread KOVACS Krisztian
  Hi,

These patches are my second try at providing Linux 2.2-like transparent
proxying support for Linux 2.6.

Major changes since the first version:

- iptable_tproxy now does IPv4 fragment reassembly (necessary for
  processing TCP/UDP header)

- The removal of the source address check in ip_route_output() was
  incorrect.  Instead, I've implemented a separate setsockopt-settable
  per-socket flag (setting it requires CAP_NET_ADMIN) to selectively
  loosen that check in ip_route_output().

Besides these, I've tried to fix all the problems raised on netdev@ in
January.

Unfortunately the newly introduced IP_TRANSPARENT socket option leads to
a quite intrusive set of patches touching core IPv4 routing and TCP
code, however this was necessary as DaveM rejected our idea of using
IP_FREEBIND instead (and he's right, of course, as it would have caused
ABI breakage.) The current approach works by adding a new bit to the
flag field in "struct flowi".

Furthermore, I haven't removed the IPv4 routing local diversion code
(caching socket lookups in the skb) yet. Patrick recommended throwing it
out altogether and use mark-based policy routing instead, but I still
think that would be harming usability as the user would need to
harmonize the configuration in order to have two completely independent
subsystems interoperate.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TCP minisock tcp_create_openreq_child() typo?

2007-02-28 Thread KOVACS Krisztian

  Hi,

  While reading TCP minisock code I've found this suspiciously looking
code fragment:

- 8< -
struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock 
*req, struct sk_buff *skb)
{
struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);

if (newsk != NULL) {
const struct inet_request_sock *ireq = inet_rsk(req);
struct tcp_request_sock *treq = tcp_rsk(req);
struct inet_connection_sock *newicsk = inet_csk(sk);
struct tcp_sock *newtp;
- 8< -

  The above code initializes newicsk to inet_csk(sk), isn't that supposed
to be inet_csk(newsk)?  As far as I can tell this might leave
icsk_ack.last_seg_size zero even if we do have received data.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output)

2007-02-06 Thread KOVACS Krisztian
On Wednesday 10 January 2007 07:47, Patrick McHardy wrote:
> KOVACS Krisztian wrote:
> > ip_route_output() contains a check to make sure that no flows with
> > non-local source IP addresses are routed. Unfortunately this check
> > makes it completely impossible to use non-local bound sockets as no
> > outbound packets will make through the stack.
> >
> > This patch moves the interface lookup to the multicast-specific code
> > path as that is the only real user of the interface data looked up.
> >
> > Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>
> >
> > ---
> >
> >  net/ipv4/route.c |   13 +
> >  1 files changed, 5 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 537b976..bb1158a 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -2498,11 +2498,6 @@ #endif
> > ZERONET(oldflp->fl4_src))
> > goto out;
> >
> > -   /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
> > -   dev_out = ip_dev_find(oldflp->fl4_src);
> > -   if (dev_out == NULL)
> > -   goto out;
> > -
>
> I'm not sure how exactly this is used by applications, but couldn't you
> restrict this to sockets without freebind?

As it turned out since I've submitted this patch simply removing the 
branch in the quoted patch above is not good, as that'd allow all local 
users to generate connections from a non-local IP address. (Since setting 
IP_FREEBIND does not require CAP_NET_ADMIN.)

I've attempted to restrict the removal of the check to certain sockets, 
but it is more difficult than expected. It'd require touching a lot of 
areas of the kernel code, as the socket is not available at times where 
an output routing lookup is requested.

In fact the only thing available when making the decision in 
ip_route_output_slow() is a struct flowi. I've tried to stuff a flag bit 
into "struct flowi", but that solution seems to be very risky, as the 
value for "struct flowi->flags" is not consulted at a lot of places. IMHO 
the result would be far from pretty... (And I have to admit that I don't 
really know what flowi->flags is used for. I've found no in-tree user of 
that field. The only defined flag bit, FLOWI_FLAG_MULTIPATHOLDROUTE, has 
no in-tree user either.)

And even if we have this flag in place, it's not enough to set it for 
certain sockets in ip_route_connect(): this would not handle SYN+ACK or 
ACK packets sent in response for redirected TCP connection attempts. And 
who knows what else is still hiding there: ip_route_output_*() calls are 
pretty much everywhere in the whole net/ipv4 directory.

So I think the cleanest solution would be to require CAP_NET_ADMIN for 
IP_FREEBIND. This way, a non-root process would not be allowed to bind to 
a non-local socket, thus it would not be possible to initiate connections 
from a non-local IP.

As this would be a change in the kernel ABI, me and Balazs have tried to 
search for applications using the IP_FREEBIND option using Google 
codesearch (www.google.com/codesearch).

Outside libc and kernel, we've found only three applications that mention
this option:
* socat: which allows setting all socket options by the user (I doubt 
using IP_FREEBIND with socat has any meaningful use)
* strace: to be able to dump IP_FREEBIND
* qemu: for emulating Linux system calls

Neither of these require IP_FREEBIND as core functionality, and will 
probably work if IP_FREEBIND would be bound to CAP_NET_ADMIN.

So the question is: shall we take the IP_FREEBIND approach, this would 
change a hardly ever used interface by requiring CAP_NET_ADMIN 
capabilities, or we should try finding all the scattered places in the 
Linux IP stack which does a route lookup?

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs

2007-01-16 Thread KOVACS Krisztian

  Hi,

On Wednesday 10 January 2007 13:19, Patrick McHardy wrote:
> >   Of course it's true that doing early lookups and storing that
> > reference in the skb widens the window considerably, but I think this
> > race is already handled. Or is there anything I don't see?
>
> You're right, it seems to be handled properly (except I think there is
> a race between sk_common_release calling xfrm_sk_free_policy and f.e.
> udp calling __xfrm_policy_check, will look into that).
>
> It probably shouldn't be cached anyway, with nf_queue for example
> the window could be _really_ large.

  Patrick, I seem to be out of ideas how this could be done 
without "caching" the socket lookup. The problem is that it's not only 
caching in some cases. For example we can do something like this:

  iptables -t tproxy -A PREROUTING -s X -d Y -p tcp --dport 80 \
   -j TPROXY --to proxy_ip:proxy_port

  In this case the TPROXY target does a socket lookup for 
proxy_ip:proxy_port and stores that socket reference in skb->sk. 
Obviously if you don't do this then TCP will do a lookup on the packet's 
original destination address/port and it won't work.

  Unfortunately I don't see any way how this could be solved without 
storing the result of the lookup... So while I agree that having that 
socket reference in the skb is risky, as previously skb->sk was unused on 
the input path, I simply don't have any other idea. (Unless your load 
iptable_tproxy skb->sk==NULL on input is still true with these patches, 
so I think there should be absolutely no problems with tproxy unused.)

  Other possible problems which came to my mind:

- The previous version was missing IPv4 fragment reassembly: we obviously 
need this to be able to do socket lookups, so now I've added this to 
iptable_tproxy.
- IP_FREEBIND does not require NET_ADMIN capability, combined with the 
relaxed source address on ip_output() this means that we provide a way to 
do IPv4 address forging for unprivileged users. As we must not break 
anything it looks like we need a separate socket option for disabling 
output source address checks (this would obviously require NET_ADMIN).

  Thoughts? I'd be especially interested in any ideas wrt. the socket 
reference problems, as the other two seems to be easier to solve.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs

2007-01-11 Thread KOVACS Krisztian

  Hi,

On Wednesday 10 January 2007 13:32, Patrick McHardy wrote:
> How exactly are dynamic ports handled? Do you just add a catch-all rule
> that filters based on socket lookups?
>
> In that case you could do something like this:
>
> ip route add local default dev lo scope host table 1
> ip rule add fwmark 0x1 lookup 1
>
> and still use the socket lookups for marking, which would (without the
> socket caching) remove the need for this patch entirely.

  Ok, I'll try to address all the concerns raised on the list.

  Thanks a lot for the review and comments.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs

2007-01-10 Thread KOVACS Krisztian

  Hi,

On Wednesday 10 January 2007 07:46, Patrick McHardy wrote:
> > +   rcu_read_lock();
> > +   for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
> > +rth = rcu_dereference(rth->u.rt_next)) {
> > +   if (rth->fl.fl4_dst == iph->daddr &&
> > +   rth->fl.fl4_src == iph->saddr &&
> > +   rth->fl.iif == iif &&
> > +   rth->fl.oif == 0 &&
> > +   rth->fl.mark == skb->mark &&
> > +   (rth->u.dst.flags & DST_DIVERTED) &&
> > +   rth->fl.fl4_tos == tos) {
>
> Mark and tos look unnecessary here since they don't affect the further
> processing of the packet.

  Indeed, thanks for spotting it.

> > +   rth->u.dst.lastuse = jiffies;
> > +   dst_hold(&rth->u.dst);
> > +   rth->u.dst.__use++;
> > +   RT_CACHE_STAT_INC(in_hit);
> > +   rcu_read_unlock();
> > +
> > +   dst_release(skb->dst);
> > +   skb->dst = (struct dst_entry*)rth;
> > +
> > +   if (sk) {
> > +   sock_hold(sk);
> > +   skb->sk = sk;
>
> This looks racy, the socket could be closed between the lookup and
> the actual use. Why do you need the socket lookup at all, can't
> you just divert all packets selected by iptables?

  Yes, it's racy, but I this is true for the "regular" socket lookup, too. 
Take UDP for example: __udp4_lib_rcv() does the socket lookup, gets a 
reference to the socket, and then calls udp_queue_rcv_skb() to queue the 
skb. As far as I can see there's nothing there which prevents the socket 
from being closed between these calls. sk_common_release() even documents 
this behaviour:

[...]
if (sk->sk_prot->destroy)
sk->sk_prot->destroy(sk);

/*
 * Observation: when sock_common_release is called, processes have
 * no access to socket. But net still has.
 * Step one, detach it from networking:
 *
 * A. Remove from hash tables.
 */

sk->sk_prot->unhash(sk);

/*
 * In this point socket cannot receive new packets, but it is possible
 * that some packets are in flight because some CPU runs receiver and
 * did hash table lookup before we unhashed socket. They will achieve
 * receive queue and will be purged by socket destructor.
 *
 * Also we still have packets pending on receive queue and probably,
 * our own packets waiting in device queues. sock_destroy will drain
 * receive queue, but transmitted packets will delay socket destruction
 * until the last reference will be released.
 */
[...]

  Of course it's true that doing early lookups and storing that reference 
in the skb widens the window considerably, but I think this race is 
already handled. Or is there anything I don't see?

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 05/10] Remove local address check on IP output

2007-01-10 Thread KOVACS Krisztian

  Hi,

On Wednesday 10 January 2007 07:47, Patrick McHardy wrote:
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 537b976..bb1158a 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -2498,11 +2498,6 @@ #endif
> > ZERONET(oldflp->fl4_src))
> > goto out;
> >
> > -   /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
> > -   dev_out = ip_dev_find(oldflp->fl4_src);
> > -   if (dev_out == NULL)
> > -   goto out;
> > -
>
> I'm not sure how exactly this is used by applications, but couldn't you
> restrict this to sockets without freebind?

  I'll try to do so in the next incarnation of the patches. Thanks for the 
comment, it'd ineed be safer to do so.

  BTW, could anyone shed some light on exactly why that check is 
necessary? As far as I can see it prevents packets with a non-local 
source address being routed -- but I fail to see why we need to prevent 
that.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 00/10] Transparent proxying patches version 4

2007-01-08 Thread KOVACS Krisztian

  Hi Evgeniy,

On Wednesday 03 January 2007 18:23, Evgeniy Polyakov wrote:
> Out of curiosity, would you use netchannels [1] if the implementation
> will be much broader? Since what you have created works exactly like
> netchannels netfilter NAT target (although it does not change ports,
> but it can be trivially extended), but without all existing netfilter
> overhead and without hacks in core TCP/UDP/IP/route code.

  Indeed, a netchannels based implementation would be very nice. Combined 
with a userspace network stack I think this could be a very powerful 
tool, especially for people doing dirty tricks -- like transparent 
proxying in our case.

  However, I think that adopting netchannels now would be an enormous work 
on our part. Of course, personally I'm really interested in netchannels 
and the related projects, but I agree with Harald that we still have a 
long way to go before being able to switch to netchannels. And I 
definitely _hate_ the previous incarnations of our tproxy patches enough 
that even this patchset seems acceptable for me. ;)

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 00/10] Transparent proxying patches version 4

2007-01-04 Thread KOVACS Krisztian

  Hi,

On Wednesday 03 January 2007 20:33, Lennert Buytenhek wrote:
> I'd also love to see the old tproxy API go away entirely.  It was
> always a bit of a pain to use.

  It's gone with these patches: all you need is to bind() to foreign 
addresses, like in the Linux 2.2 days.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 05/10] Remove local address check on IP output

2007-01-03 Thread KOVACS Krisztian
ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. Unfortunately this check
makes it completely impossible to use non-local bound sockets as no
outbound packets will make through the stack.

This patch moves the interface lookup to the multicast-specific code
path as that is the only real user of the interface data looked up.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/ipv4/route.c |   13 +
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 537b976..bb1158a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2498,11 +2498,6 @@ #endif
ZERONET(oldflp->fl4_src))
goto out;
 
-   /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-   dev_out = ip_dev_find(oldflp->fl4_src);
-   if (dev_out == NULL)
-   goto out;
-
/* I removed check for oif == dev_out->oif here.
   It was wrong for two reasons:
   1. ip_dev_find(saddr) can return wrong iface, if saddr is
@@ -2528,12 +2523,14 @@ #endif
   Luckily, this hack is good workaround.
 */
 
+   /* It is equivalent to inet_addr_type(saddr) == 
RTN_LOCAL */
+   dev_out = ip_dev_find(oldflp->fl4_src);
+   if (dev_out == NULL)
+   goto out;
+
fl.oif = dev_out->ifindex;
goto make_route;
}
-   if (dev_out)
-   dev_put(dev_out);
-   dev_out = NULL;
}
 
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 03/10] Don't do the TCP socket lookup if we already have one attached

2007-01-03 Thread KOVACS Krisztian
TCP input code path looks up the TCP socket hash tables to find a
socket matching the incoming packet. However, as iptable_tproxy does
socket lookups early the skb may already have the appropriate
reference attached, in that case we steal that reference instead of
doing the lookup.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/ipv4/tcp_ipv4.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index bf7a224..7828aec 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1647,9 +1647,16 @@ int tcp_v4_rcv(struct sk_buff *skb)
TCP_SKB_CB(skb)->flags   = skb->nh.iph->tos;
TCP_SKB_CB(skb)->sacked  = 0;
 
-   sk = __inet_lookup(&tcp_hashinfo, skb->nh.iph->saddr, th->source,
-  skb->nh.iph->daddr, th->dest,
-  inet_iif(skb));
+   if (unlikely(skb->sk)) {
+   /* steal reference */
+   sk = skb->sk;
+   skb->destructor = NULL;
+   skb->sk = NULL;
+   } else {
+   sk = __inet_lookup(&tcp_hashinfo, skb->nh.iph->saddr, 
th->source,
+  skb->nh.iph->daddr, th->dest,
+  inet_iif(skb));
+   }
 
if (!sk)
goto no_tcp_socket;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 06/10] Create a tproxy flag in struct sk_buff

2007-01-03 Thread KOVACS Krisztian
We would like to be able to match on whether or not a given packet has
been diverted by tproxy. To make this possible we need a flag in
sk_buff.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/skbuff.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..6d7f5c7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -284,7 +284,8 @@ struct sk_buff {
nfctinfo:3;
__u8pkt_type:3,
fclone:2,
-   ipvs_property:1;
+   ipvs_property:1,
+   ip_tproxy:1;
__be16  protocol;
 
void(*destructor)(struct sk_buff *skb);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 04/10] Don't do the UDP socket lookup if we already have one attached

2007-01-03 Thread KOVACS Krisztian
UDP input code path looks up the UDP socket hash tables to find a
socket matching the incoming packet. However, as iptable_tproxy does
socket lookups early the skb may already have the appropriate
reference attached, in that case we steal that reference instead of
doing the lookup.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/ipv4/udp.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cfff930..1b348f5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1225,8 +1225,15 @@ int __udp4_lib_rcv(struct sk_buff *skb,
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, 
udptable);
 
-   sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
-  skb->dev->ifindex, udptable);
+   if (skb->sk) {
+   /* steal reference */
+   sk = skb->sk;
+   skb->destructor = NULL;
+   skb->sk = NULL;
+   } else {
+   sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+  skb->dev->ifindex, udptable);
+   }
 
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 02/10] Port redirection support for TCP

2007-01-03 Thread KOVACS Krisztian
Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/inet_sock.h |1 +
 include/net/tcp.h   |1 +
 net/ipv4/inet_connection_sock.c |2 ++
 net/ipv4/tcp_output.c   |2 +-
 4 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ce6da97..0bd167b 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -64,6 +64,7 @@ #if defined(CONFIG_IPV6) || defined(CONF
 #endif
__be32  loc_addr;
__be32  rmt_addr;
+   __be16  loc_port;
__be16  rmt_port;
u16 snd_wscale : 4, 
rcv_wscale : 4, 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b7d8317..08ea8f3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -983,6 +983,7 @@ static inline void tcp_openreq_init(stru
ireq->acked = 0;
ireq->ecn_ok = 0;
ireq->rmt_port = skb->h.th->source;
+   ireq->loc_port = skb->h.th->dest;
 }
 
 extern void tcp_enter_memory_pressure(void);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 9d68837..889a487 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -502,6 +502,8 @@ struct sock *inet_csk_clone(struct sock
newicsk->icsk_bind_hash = NULL;
 
inet_sk(newsk)->dport = inet_rsk(req)->rmt_port;
+   inet_sk(newsk)->num = ntohs(inet_rsk(req)->loc_port);
+   inet_sk(newsk)->sport = inet_rsk(req)->loc_port;
newsk->sk_write_space = sk_stream_write_space;
 
newicsk->icsk_retransmits = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 32c1a97..bb37048 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2132,7 +2132,7 @@ #endif
th->syn = 1;
th->ack = 1;
TCP_ECN_make_synack(req, th);
-   th->source = inet_sk(sk)->sport;
+   th->source = ireq->loc_port;
th->dest = ireq->rmt_port;
TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 00/10] Transparent proxying patches version 4

2007-01-03 Thread KOVACS Krisztian
The following set of patches implement transparent proxying support
loosely modeled on the Linux 2.2 transparent proxying functionality.

In the last few years we've been maintaining a set of patches
implementing Netfilter NAT to provide similar functionality. However,
as time passed, more and more bugs surfaced, some of which were not
possible to fix using that approach. Also, those patches required
modification of user-space application code and the "API" provided was
neither clean nor easy to use.

So instead of using NAT to dynamically redirect traffic to local
addresses, we now rely on "native" non-locally-bound sockets and do
early socket lookups for inbound IPv4 packets. These lookups are done
in a separate Netfilter/iptables module, so there are only negligible
performance implications of building transparent proxying support as a
module and then not loading it.

Small modifications were also necessary in IP/TCP/UDP core code to
support the Netfilter modules. All those have been functionally split
out into stand-alone patches among which there are no direct
dependencies. Among these changes are ones which I think might be
potentially risky, especially the core IPv4 routing code changes.

Also please note that at the moment only IPv4 support is implemented,
but opposed to the NAT-based approach taken by older TProxy versions
IPv6 support is possible this way.

Comments welcome...

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 08/10] iptables tproxy table

2007-01-03 Thread KOVACS Krisztian
The iptables tproxy table registers a new hook on PRE_ROUTING and for
each incoming TCP/UDP packet performs as follows:

1. Does a TCP/UDP socket hash lookup to decide whether or not the packet
   is sent to a non-local bound socket. If a matching socket is found
   and the socket has the IP_FREEBIND socket option enabled the skb is
   diverted locally and the socket reference is stored in the skb.

2. If no matching socket was found, the PREROUTING chain of the
   iptables tproxy table is consulted. Matching rules with the TPROXY
   target can do transparent redirection here. (In this case it is not
   necessary to have the IP_FREEBIND socket option enabled for the
   target socket, redirection takes place even for "regular"
   sockets. This way no modification of the application is necessary.)

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/netfilter_ipv4/ip_tproxy.h |   20 ++
 net/ipv4/netfilter/Kconfig   |   10 +
 net/ipv4/netfilter/Makefile  |1 
 net/ipv4/netfilter/iptable_tproxy.c  |  253 ++
 4 files changed, 284 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter_ipv4/ip_tproxy.h 
b/include/linux/netfilter_ipv4/ip_tproxy.h
new file mode 100644
index 000..ae890e3
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ip_tproxy.h
@@ -0,0 +1,20 @@
+#ifndef _IP_TPROXY_H
+#define _IP_TPROXY_H
+
+#include 
+
+/* look up and get a reference to a matching socket */
+extern struct sock *
+ip_tproxy_get_sock(const u8 protocol,
+  const __be32 saddr, const __be32 daddr,
+  const __be16 sport, const __be16 dport,
+  const struct net_device *in);
+
+/* divert skb to a given socket */
+extern int
+ip_tproxy_do_divert(struct sk_buff *skb,
+   const struct sock *sk,
+   const int require_freebind,
+   const struct net_device *in);
+
+#endif
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index f6026d4..312b0ef 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -652,6 +652,16 @@ config IP_NF_RAW
  If you want to compile it as a module, say M here and read
  .  If unsure, say `N'.
 
+# tproxy table
+config IP_NF_TPROXY
+   tristate "Transparent proxying"
+   depends on IP_NF_IPTABLES
+   help
+ Transparent proxying. For more information see
+ http://www.balabit.com/downloads/tproxy.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 # ARP tables
 config IP_NF_ARPTABLES
tristate "ARP tables support"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 15e741a..aa57ce4 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_IP_NF_MANGLE) += iptable_ma
 obj-$(CONFIG_IP_NF_NAT) += iptable_nat.o
 obj-$(CONFIG_NF_NAT) += iptable_nat.o
 obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o
+obj-$(CONFIG_IP_NF_TPROXY) += iptable_tproxy.o
 
 # matches
 obj-$(CONFIG_IP_NF_MATCH_IPRANGE) += ipt_iprange.o
diff --git a/net/ipv4/netfilter/iptable_tproxy.c 
b/net/ipv4/netfilter/iptable_tproxy.c
new file mode 100644
index 000..6049c83
--- /dev/null
+++ b/net/ipv4/netfilter/iptable_tproxy.c
@@ -0,0 +1,253 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define TPROXY_VALID_HOOKS (1 << NF_IP_PRE_ROUTING)
+
+#if 0
+#define DEBUGP printk
+#else
+#define DEBUGP(f, args...)
+#endif
+
+static struct
+{
+   struct ipt_replace repl;
+   struct ipt_standard entries[2];
+   struct ipt_error term;
+} initial_table __initdata = {
+   .repl = {
+   .name = "tproxy",
+   .valid_hooks = TPROXY_VALID_HOOKS,
+   .num_entries = 2,
+   .size = sizeof(struct ipt_standard) + sizeof(struct ipt_error),
+   .hook_entry = {
+   [NF_IP_PRE_ROUTING] = 0 },
+   .underflow = {
+   [NF_IP_PRE_ROUTING] = 0 },
+   },
+   .entries = {
+   /* PRE_ROUTING */
+   {
+   .entry = {
+   .target_offset = sizeof(struct ipt_entry),
+   .next_offset = sizeof(struct ipt_standard),
+   },
+   .target = {
+

[PATCH/RFC 10/10] iptables tproxy match

2007-01-03 Thread KOVACS Krisztian
Implements an iptables module which matches packets which have the
tproxy flag set, that is, packets diverted in the tproxy table.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 net/netfilter/Kconfig |9 +
 net/netfilter/Makefile|1 +
 net/netfilter/xt_tproxy.c |   77 +
 3 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 1b853c3..76c6f14 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -559,6 +559,15 @@ config NETFILTER_XT_MATCH_QUOTA
  If you want to compile it as a module, say M here and read
  .  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_TPROXY
+   tristate '"tproxy" match support'
+   depends on NETFILTER_XTABLES
+   help
+ This option adds a `tproxy' match, which allows you to match
+ packets which have been diverted to local sockets by TProxy.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_REALM
tristate  '"realm" match support'
depends on NETFILTER_XTABLES
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 5dc5574..4a83585 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) +=
 obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_TPROXY) += xt_tproxy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o
diff --git a/net/netfilter/xt_tproxy.c b/net/netfilter/xt_tproxy.c
new file mode 100644
index 000..53f8bee
--- /dev/null
+++ b/net/netfilter/xt_tproxy.c
@@ -0,0 +1,77 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2007 BalaBit IT Ltd.
+ * Author: Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+
+#include 
+
+static int
+match(const struct sk_buff *skb,
+  const struct net_device *in,
+  const struct net_device *out,
+  const struct xt_match *match,
+  const void *matchinfo,
+  int offset,
+  unsigned int protoff,
+  int *hotdrop)
+{
+   return skb->ip_tproxy;
+}
+
+static int
+check(const char *tablename,
+  const void *entry,
+  const struct xt_match *match,
+  void *matchinfo,
+  unsigned int hook_mask)
+{
+   return 1;
+}
+
+static struct xt_match tproxy_matches[] = {
+   {
+   .name   = "tproxy",
+   .match  = match,
+   .matchsize  = 0,
+   .checkentry = check,
+   .family = AF_INET,
+   .me = THIS_MODULE,
+   },
+   {
+   .name   = "tproxy",
+   .match  = match,
+   .matchsize  = 0,
+   .checkentry = check,
+   .family = AF_INET6,
+   .me = THIS_MODULE,
+   },
+};
+
+static int __init xt_tproxy_init(void)
+{
+   return xt_register_matches(tproxy_matches, ARRAY_SIZE(tproxy_matches));
+}
+
+static void __exit xt_tproxy_fini(void)
+{
+   xt_unregister_matches(tproxy_matches, ARRAY_SIZE(tproxy_matches));
+}
+
+module_init(xt_tproxy_init);
+module_exit(xt_tproxy_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Krisztian Kovacs <[EMAIL PROTECTED]>");
+MODULE_DESCRIPTION("iptables tproxy match module");
+MODULE_ALIAS("ipt_tproxy");
+MODULE_ALIAS("ip6t_tproxy");
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC 09/10] iptables TPROXY target

2007-01-03 Thread KOVACS Krisztian
The TPROXY target implements redirection of non-local TCP/UDP traffic
to local sockets. It is simply a wrapper around functionality exported
from iptable_tproxy.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/linux/netfilter_ipv4/ipt_TPROXY.h |9 +++
 net/ipv4/netfilter/Kconfig|   11 +++
 net/ipv4/netfilter/Makefile   |1 
 net/ipv4/netfilter/ipt_TPROXY.c   |  103 +
 4 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter_ipv4/ipt_TPROXY.h 
b/include/linux/netfilter_ipv4/ipt_TPROXY.h
new file mode 100644
index 000..d05c956
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ipt_TPROXY.h
@@ -0,0 +1,9 @@
+#ifndef _IPT_TPROXY_H_target
+#define _IPT_TPROXY_H_target
+
+struct ipt_tproxy_target_info {
+   u_int16_t lport;
+   u_int32_t laddr;
+};
+
+#endif
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 312b0ef..7f76ab6 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -662,6 +662,17 @@ config IP_NF_TPROXY
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_TARGET_TPROXY
+   tristate "TPROXY target support"
+   depends on IP_NF_TPROXY
+   help
+ This option adds a `TPROXY' target, which is somewhat similar to
+ REDIRECT.  It can only be used in the tproxy table and is useful
+ to redirect traffic to a transparent proxy.  It does _not_ depend
+ on Netfilter connection tracking.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 # ARP tables
 config IP_NF_ARPTABLES
tristate "ARP tables support"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index aa57ce4..851da93 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_U
 obj-$(CONFIG_IP_NF_TARGET_TCPMSS) += ipt_TCPMSS.o
 obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o
 obj-$(CONFIG_IP_NF_TARGET_TTL) += ipt_TTL.o
+obj-$(CONFIG_IP_NF_TARGET_TPROXY) += ipt_TPROXY.o
 
 # generic ARP tables
 obj-$(CONFIG_IP_NF_ARPTABLES) += arp_tables.o
diff --git a/net/ipv4/netfilter/ipt_TPROXY.c b/net/ipv4/netfilter/ipt_TPROXY.c
new file mode 100644
index 000..6f64717
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_TPROXY.c
@@ -0,0 +1,103 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static unsigned int
+target(struct sk_buff **pskb,
+   const struct net_device *in,
+   const struct net_device *out,
+   unsigned int hooknum,
+   const struct xt_target *target,
+   const void *targinfo)
+{
+   const struct iphdr *iph = (*pskb)->nh.iph;
+   unsigned int verdict = NF_ACCEPT;
+   struct sk_buff *skb = *pskb;
+   struct udphdr _hdr, *hp;
+   struct sock *sk;
+
+   /* TCP/UDP only */
+   if ((iph->protocol != IPPROTO_TCP) &&
+   (iph->protocol != IPPROTO_UDP))
+   return NF_ACCEPT;
+
+   if (in == NULL)
+   return NF_ACCEPT;
+
+   if ((skb->dst != NULL) || (skb->ip_tproxy == 1))
+   return NF_ACCEPT;
+
+   hp = skb_header_pointer(*pskb, iph->ihl * 4, sizeof(_hdr), &_hdr);
+   if (hp == NULL)
+   return NF_DROP;
+
+   sk = ip_tproxy_get_sock(iph->protocol,
+   iph->saddr, iph->daddr,
+   hp->source, hp->dest, in);
+   if (sk != NULL) {
+   if (ip_tproxy_do_divert(skb, sk, 0, in) < 0)
+   verdict = NF_DROP;
+   sock_put(sk);
+   }
+
+   return verdict;
+}
+
+static int
+checkentry(const char *tablename,
+  const void *e,
+  const struct xt_target *target,
+   void *targinfo,
+   unsigned int hook_mask)
+{
+   /* checks are now done by the x_tables core based on
+* information specified in the ipt_target structure */
+   return 1;
+}
+
+static struct ipt_target ipt_tproxy_reg = {
+   .name   = "TPROXY",
+   .target = target,
+   .targetsize = sizeof(struct ipt_tproxy_target_info),
+   .table  = "tproxy",
+   .checkentry = checkentry,
+   .me = THIS_MODULE,
+};
+
+static int __init init(void)
+{
+   if (ipt_register_target(&ipt_tproxy_reg))
+   return -EINVAL;
+
+   return 0;
+}
+
+static void __exit fini(void)
+{
+

[PATCH/RFC 01/10] Implement local diversion of IPv4 skbs

2007-01-03 Thread KOVACS Krisztian
The input path for non-local bound sockets requires diverting certain
packets locally, even if their destination IP address is not
considered local. We achieve this by assigning a specially crafted dst
entry to these skbs, and optionally also attaching a socket to the skb
so that the upper layer code does not need to redo the socket lookup.

We also have to be able to differentiate between these fake entries
and "real" entries in the cache: it is perfectly legal that the
diversion is done only for certain TCP or UDP packets and not for all
packets of the flow. Since these special dst entries are used only by
the iptables tproxy code, and that code uses exclusively these
entries, simply flagging these entries as DST_DIVERTED is OK. All
other cache lookup paths skip diverted entries, while our new
ip_divert_local() function uses exclusively diverted dst entries.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/dst.h   |1 
 include/net/route.h |2 +
 net/ipv4/route.c|  106 +++
 3 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 62b7e75..72b712c 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -50,6 +50,7 @@ #define DST_NOXFRM2
 #define DST_NOPOLICY   4
 #define DST_NOHASH 8
 #define DST_BALANCED0x10
+#define DST_DIVERTED   0x20
unsigned long   lastuse;
unsigned long   expires;
 
diff --git a/include/net/route.h b/include/net/route.h
index 486e37a..ee52393 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -126,6 +126,8 @@ extern int  ip_rt_ioctl(unsigned int cmd
 extern voidip_rt_get_source(u8 *src, struct rtable *rt);
 extern int ip_rt_dump(struct sk_buff *skb,  struct 
netlink_callback *cb);
 
+extern int ip_divert_local(struct sk_buff *skb, const struct 
in_device *in, struct sock *sk);
+
 struct in_ifaddr;
 extern void fib_add_ifaddr(struct in_ifaddr *);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2daa0dc..537b976 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -942,9 +942,11 @@ restart:
while ((rth = *rthp) != NULL) {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
if (!(rth->u.dst.flags & DST_BALANCED) &&
+   ((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & 
DST_DIVERTED)) &&
compare_keys(&rth->fl, &rt->fl)) {
 #else
-   if (compare_keys(&rth->fl, &rt->fl)) {
+   if (((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & 
DST_DIVERTED)) &&
+   compare_keys(&rth->fl, &rt->fl)) {
 #endif
/* Put it first */
*rthp = rth->u.rt_next;
@@ -1166,6 +1168,7 @@ void ip_rt_redirect(__be32 old_gw, __be3
if (rth->fl.fl4_dst != daddr ||
rth->fl.fl4_src != skeys[i] ||
rth->fl.oif != ikeys[k] ||
+   (rth->u.dst.flags & DST_DIVERTED) ||
rth->fl.iif != 0) {
rthp = &rth->u.rt_next;
continue;
@@ -1526,6 +1529,105 @@ static int ip_rt_bug(struct sk_buff *skb
return 0;
 }
 
+static void ip_divert_free_sock(struct sk_buff *skb)
+{
+   struct sock *sk = skb->sk;
+
+   skb->sk = NULL;
+   skb->destructor = NULL;
+   sock_put(sk);
+}
+
+int ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct 
sock *sk)
+{
+   struct iphdr *iph = skb->nh.iph;
+   struct rtable *rth, *rtres;
+   unsigned hash;
+   const int iif = in->dev->ifindex;
+   u_int8_t tos;
+   int err;
+
+   /* look up hash first */
+   tos = iph->tos & IPTOS_RT_MASK;
+   hash = rt_hash_code(iph->daddr, iph->saddr ^ (iif << 5));
+
+   rcu_read_lock();
+   for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
+rth = rcu_dereference(rth->u.rt_next)) {
+   if (rth->fl.fl4_dst == iph->daddr &&
+   rth->fl.fl4_src == iph->saddr &&
+   rth->fl.iif == iif &&
+   rth->fl.oif == 0 &&
+   rth->fl.mark == skb->mark &&
+   (rth->u.dst.flags & DST_DIVERTED) &&
+   rth->fl.fl4_tos == tos) {
+   rth->u.dst.lastuse = jiffies;
+   dst_hold(&rth->u.dst);
+   rth->u.dst.__use++;
+   RT_CACHE_STAT_INC(in_hit

[PATCH/RFC 07/10] Export UDP socket lookup function

2007-01-03 Thread KOVACS Krisztian
The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.

Signed-off-by: KOVACS Krisztian <[EMAIL PROTECTED]>

---

 include/net/udp.h |4 
 net/ipv4/udp.c|8 
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 1b921fa..ea5aa31 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -141,6 +141,10 @@ extern int udp_lib_setsockopt(struct so
   char __user *optval, int optlen,
   int (*push_pending_frames)(struct sock *));
 
+extern struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+   __be32 daddr, __be16 dport,
+   int dif);
+
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 /*
  * SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1b348f5..a44d3d3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -284,6 +284,14 @@ static struct sock *__udp4_lib_lookup(__
return result;
 }
 
+struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+__be32 daddr, __be16 dport,
+int dif)
+{
+   return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup);
+
 static inline struct sock *udp_v4_mcast_next(struct sock *sk,
 __be16 loc_port, __be32 loc_addr,
 __be16 rmt_port, __be32 rmt_addr,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: XFRM changing the view of xfrm_aevent_id

2006-12-01 Thread KOVACS Krisztian

  Hi,

On Friday 01 December 2006 15:37, jamal wrote:
> struct xfrm_aevent_id {
> struct xfrm_usersa_id   sa_id;
> __u32   flags;
> };
>
> I could add the two items mentioned above in it and break the ABI.
> This may sound dangerous, but the usage of this ABI is not widespread.
> AFAIK, the only other person who might have used this is Kristzian (on
> CC).

  I do not use the XFRM netlink interface at the moment, so breaking the 
ABI is absolutely not a problem for me.

--
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [XFRM] Restore aevent timer

2006-04-11 Thread KOVACS Krisztian

  Hi,

On Tuesday 11 April 2006 05.02, jamal wrote:
> Ok, if both you can provide feedback on the attached patch (untested but
> compiles) I will make any necessary changes, test and push this +
> documentation to Dave.

  Looks ok, although I only had a quick look at it.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [XFRM] Restore aevent timer

2006-04-10 Thread KOVACS Krisztian

  Hi,

On Friday 07 April 2006 15:15, jamal wrote:
> Ok, I built on Herbert's suggestion and tried to be a little
> clever/accurate. Instead of a flag i introduce a variable that stores
> the jiffy point when the timer is killed. If we fall anywhere to the
> right or at exact point of the next point expected timer when the first
> packet arrives, then we do an immediate update of type
> XFRM_REPLAY_TIMEOUT else we keep receiving packets and wait until the
> right time.

  Ok, this certainly fixes the corner case. About the wrap-around: as 
Herbert wrote, in any sane setup there won't be a problem. 
time_after_eq() returns correct results as long as the difference between 
the arguments is less than MAX_LONG, on 32 bit architectures with HZ=1000 
this means roughly 24 days. Let's assume repla_maxage is negligible 
compared to jiffies. So in case we don't have any traffic for about 24 
days then we'll miss the XFRM_REPLAY_TIMEOUT notification. Not a very 
realistic scenario IMHO.

  But I still perfer the flag. With this timestamp thing you introduce a 
delay of replay_maxage jiffies. Say the timer fires, but there was no 
traffic so no notification is sent. With the timestamp you won't generate 
an event for another replay_maxage even if there were some packets sent 
(but not reaching the seqno threshold). With only the flag we'll have a 
notification right at the first packet. I think this might allow slightly 
more accurate tracking of sequence numbers.

> I havent tested this so there maybe holes, it compiles though. I would
> like to test it first (probably at end of day) and then submit; so
> please review and provide comments which i can incorporate before i
> punt to Dave.

  Another thing still present is the possible xfrm_state leak. I think we 
need to call xfrm_state_put() as the last statement of 
xfrm_replay_timer_handler() to drop the reference we acquired when 
starting the timer.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [XFRM] Restore aevent timer

2006-04-06 Thread KOVACS Krisztian

  Hi,

On Thursday 06 April 2006 17:18, jamal wrote:
> On Fri, 2006-07-04 at 00:30 +1000, Herbert Xu wrote:
> > If so I see what you mean but I think a better solution is to just
> > set a flag when the XFRM_REPLAY_TIMEOUT fires and nothing has
> > changed. Then when you get XFRM_REPLAY_UPDATE you can notify
> > unconditionally if this flag is set.
> >
> > The flag gets cleared in case of a notification.
>
> That does sound reasonable but i need to think about it a little in
> case it misses some scenario.
> Krisztian, see any leaks with that?

  None, it's fine with me.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in PFKEY implementation...

2006-03-13 Thread KOVACS Krisztian

  Hi,

On Sunday 12 March 2006 23.29, Stjepan Gros wrote:
> setkey command behaves strangely when SPD is large. Either because I'm
> doing something wrong or because there is a bug. I believe it's a bug,
> but who knows... Anyway, after 529 items it simply stops displaying
> items from SPD with a message
>
> recv: Resource temporarily unavailable

  This has been discussed a couple of times on netdev and on the ipsec-tools 
development list. You can find more info in these threads, for example:





  As a workaround for the problem you could try increasing the size of the 
socket buffers available for PF_KEY sockets. Unfortunately you still have 
to patch ipsec-tools for this to work, because for some unknown reason it 
forces 128K buffers on all pfkey sockets.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/6] IPSEC: core updates

2006-01-31 Thread KOVACS Krisztian

  Hi,

On Monday 30 January 2006 22:33, jamal wrote:
> >   We implemented partial ISAKMP SA synchronization in racoon. That
> Unfortunately this would also mean dependency on racoon. Is there any
> other way to do it without having to change racoon? example the phase1
> scripts or racoonctl?
> It seems to me that the only useful runtime parameter really - dont
> know how you would extract this without changing racoon - is peer/local
> cookies, no? From that one should be able to generate the SAs.

  Not really IMHO. You definitely need the shared secret associated with 
that SA, DPD state, etc.

  But what about leaving this alone for now, I think the very first step 
should be something like OpenBSD's sasyncd, which absolutely does not 
care about proper ISAKMP synchronization. We can think about these things 
later, if all the kernel-level things are settled.

> >   Indeed, but this value depends on whether or not the user-space is
> > clever enough to use it. Let's suppose it will be. :)
>
> I do in the code i am testing with at the moment. I havent been testing
> a lot of corner cases - and so far havent needed any padding; but will
> let you know how it goes. In any case, lets agree we need it. Whoever
> feels brave enough to do without could.

  OK.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/6] IPSEC: core updates

2006-01-30 Thread KOVACS Krisztian

  Hi,

On Monday 30 January 2006 14.14, jamal wrote:
[...]
> >   To put it simple: I don't think PF_KEY is worth the hassle unless
> > someone comes up with an open source software utilizing that interface.
>
> I agree. And if you look at something like sasyncd, it is obvious you
> dont need it if what you want to achieve is failover. Infact you dont
> need racoon in some cases at all (example this could be done for manual
> setup as well).
>
> In the case you do use racoon, the question is ISAKMP SAs; how do you
> fail those over? i.e how do you keep racoon-master and racoon-slave
> synced - one idea could be to make sure the cookies are synced. I know
> in your work you were doing something with racoon - was there anything
> of this sort that you did?

  We implemented partial ISAKMP SA synchronization in racoon. That way the 
cookies, the shared secrets, etc. were synchronized to the slaves, so that 
after failing over the new master could even do a phase2 exchange using the 
previously negotiated shared secret. In this case a PF_KEY interface is a 
good thing because that way you can have racoon

a) set up the desired event parameters for the SAs (no need to use the 
sysctl-controlled global default setting)

b) do the IPSEC SA synchronization as well, utilizing the already existing 
framework for master-slave communication (which is necessary for ISAKMP SA 
synchronization anyway).

> > > How expensive are timers these days? Lets handwave a moderately large
> > > number of SAs (100K). With this we would have 200K timers as opposed
> > > to 100K if we aggregated them.
> >
> >   Of course we could aggregate these timers, this is just a leftover
> > from the very first patch. However, this would mean that a more
> > fine-grained timing would be necessary for that timer instead of the
> > current precision of one second.
>
> I thought about this after i typed my last email. The more fine grained
> the timer, the better the failover time. The SA expiry timers dont need
> anything more than 1 sec granularity (infact anything lower than 60
> seconds will be considered strange)
>
> > Which is of course not a problem, just makes the patch a bit more
> > intrusive.
>
> true although i am still curious if it may be worth the complexity of
> having a single timer - the only reason i can see for aggregation is for
> perfomance reasons.

  Exactly, that's what I was trying to say. Although if you have a lot of 
SAs the memory needed by an additional timer_list may be significant (about 
48 bytes per SA on 64 bit architectures if my memory serves me well).

  BTW, did anyone do any testing with such a high amount of SAs in place? I 
would be seriously interested in the results, especially if some user-space 
KM is involved as well. I only have some limited (lab environment) 
experience with racoon, which is unable to handle more than a couple 
hundred SAs when running on Linux.

> >   One more thing: whether or not this timer is necessary really depends
> > on how the sync daemon uses these events. The most simple trick of
> > simply adding a large-enough constant to the sequence numbers of
> > outgoing packets after failover does not take advantage of this
> > feature. Of course if someone comes up with a more exact way of
> > handling failover cases, then those additional update messages could be
> > useful.
>
> The timer allows for more exact values. IOW, if you use the timer and
> you dont loose any events you can _guarantee_ exact numbers; with
> padding the replay it becomes:
> a) if you make it small more of how lucky you are in choosing a good
> padding or
> b) dropping a substantial amount of packets if you use brute force and
> make it large.
>
> The trick of padding the replay values on the backup node could still be
> used in addition to the timer but only a small padding is needed.
> In any case, this mechanism is still needed/valuable.

  Indeed, but this value depends on whether or not the user-space is clever 
enough to use it. Let's suppose it will be. :)

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/6] IPSEC: core updates

2006-01-29 Thread KOVACS Krisztian

  Hi,

On Saturday 28 January 2006 13:45, jamal wrote:
> > > +extern u32 sysctl_xfrm_aevent_etime;
> > > +extern u32 sysctl_xfrm_aevent_rseqth;
> >
> > Why do we need these defaults? I'd rather see these be removed and
> > just have the user-space KM always set the values (if it needs
> > aevent).
>
> This is mostly out of laziness when i started but has turned out to be
> useful. For ease of testing,  i didnt want to set anything or make any
> changes to a KM.
> My initial attempt was to default to a time of 1 second and a replay
> threshold of 1/2 the replay window. I was frustrated using racoon to
> find it was by default setting the window to 4 ;-> (otoh, pluto sets it
> to a very high number). I also couldnt justify why 1 sec or 10 sec or
> 30 seconds as i kept changing them.
> For this reason i figured the admin would be the better person to make
> that decision and i picked the defaults based on what i was
> experimenting with.
> I could remove them if this doesnt sound like a good reason.

  I don't really like the idea of generating events unless explicitly 
requested by the KM. Once a PF_KEY interface is in place such behaviour 
mightl break compatibility with racoon's libipsec which considers unknown 
extension headers as errors.

  Whether or not a PF_KEY interface for these events is desired is also a 
good question. Initially I had a patch for PF_KEY, but only because we 
used racoon and it was easier to implement a pfkey extension than adding 
netlink support to racoon. Such PF_KEY extensions would just add even 
more types to the growing list of non-standard PF_KEY extension headers.

  To put it simple: I don't think PF_KEY is worth the hassle unless 
someone comes up with an open source software utilizing that interface.

> > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > > index e12d0be..2ccf87a 100644
> > > --- a/net/xfrm/xfrm_state.c
> > > +++ b/net/xfrm/xfrm_state.c
> > >
> > > @@ -62,6 +65,8 @@ static void xfrm_state_gc_destroy(struct
> > >  {
> > >   if (del_timer(&x->timer))
> > >   BUG();
> > > + if (del_timer(&x->rtimer))
> > > + BUG();
> >
> > How about just using x->timer?
>
> This part of the patch i left as is from Krisztian's original patch.
> i have another timer i was going to add later (an idle timeout
> to emulate a certain vendor when an SA is idle) and felt i could use
> rtimer for that as well. For this reason i left it alone since it just
> worked ;->
>
> How expensive are timers these days? Lets handwave a moderately large
> number of SAs (100K). With this we would have 200K timers as opposed to
> 100K if we aggregated them.

  Of course we could aggregate these timers, this is just a leftover from 
the very first patch. However, this would mean that a more fine-grained 
timing would be necessary for that timer instead of the current  
precision of one second. Which is of course not a problem, just makes the 
patch a bit more intrusive.

> > This seems to be counter-intuitive.  Wouldn't it make more sense to
> > schedule a timer in the XFRM_REPLAY_UPDATE case, and not schedule one
> > in the XFRM_REPLAY_TIMEOUT case? Scheduling it in the
> > XFRM_REPLAY_TIMEOUT case means that you may be waking up every maxage
> > jiffies even when nothing at all is happening.  While doing it the
> > other way means that you only schedule it when something has happened
> > and we've suppressed the event due to maxdiff.
>
> This is part of Krisztian's original patch - he would be a better
> person to respond to if we can wake him up.
>
> Waking up when nothing is happening is OK, but waking up and generating
> events when nothing is happening is not. so the rescheduling is fine as
> far as i can see but what you describe is an optimization i.e
> knowing nothing has happened and never waking up is even better.
> Krisztian?

  Sure, pretty good point.

  One more thing: whether or not this timer is necessary really depends on 
how the sync daemon uses these events. The most simple trick of simply 
adding a large-enough constant to the sequence numbers of outgoing 
packets after failover does not take advantage of this feature. Of course 
if someone comes up with a more exact way of handling failover cases, 
then those additional update messages could be useful.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPSec anti-replay sequence numbers

2005-08-05 Thread KOVACS Krisztian

  Hi,

On Friday 05 August 2005 12.50, Patrick McHardy wrote:
> Is there already userspace code which uses this feature somewhere?

  AFAIK Ulrich has a patch for OpenSWAN, and we (Balabit) have a patch 
for racoon. Unfortunately this racoon version is available only as a 
commercial product.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPSec anti-replay sequence numbers

2005-08-04 Thread KOVACS Krisztian

  Hi,

2005-08-04, cs keltezéssel 12.56-kor Ulrich Weber ezt írta:
> thanks for revising Patrick! Attached is the updated patch.
> Sorry had no time yet to remove the sysctl variables.
> It will follow in a few weeks if I have more time :)

  Ulrich, I already have some code which supports per-state difference
settings, along with optional time limits. I don't know whether or not
the latter would be necessary or not, but putting the per-state diff
values would be trivial. I'll send a patch in a couple of days if I find
the time to hack it together.

  Some questions below:

> diff -Nru linux-2.6.13-rc3.org/include/net/xfrm.h 
> linux-2.6.13-rc3/include/net/xfrm.h
> --- linux-2.6.13-rc3.org/include/net/xfrm.h   2005-07-18 10:24:11.0 
> +0200
> +++ linux-2.6.13-rc3/include/net/xfrm.h   2005-08-04 12:28:36.0 
> +0200
> @@ -134,6 +134,9 @@
>   /* State for replay detection */
>   struct xfrm_replay_state replay;
>  
> + /* Replay detection state at the time we sent the last notification */
> + struct xfrm_replay_state preplay;
> +
>   /* Statistics */
>   struct xfrm_stats   stats;
>  
> @@ -301,6 +304,10 @@
>   struct xfrm_tmplxfrm_vec[XFRM_MAX_DEPTH];
>  };
>  
> +/* which seqno */
> +#define XFRM_REPLAY_INBOUND  1
> +#define XFRM_REPLAY_OUTBOUND 2
> +
>  #define XFRM_KM_TIMEOUT  30
>  
>  struct xfrm_mgr
> @@ -312,6 +319,7 @@
>   struct xfrm_policy  *(*compile_policy)(u16 family, int opt, u8 
> *data, int len, int *dir);
>   int (*new_mapping)(struct xfrm_state *x, 
> xfrm_address_t *ipaddr, u16 sport);
>   int (*notify_policy)(struct xfrm_policy *x, int 
> dir, struct km_event *c);
> + int (*notify_seq)(struct xfrm_state *x, u32 pid, 
> u32 seq);

  Why do you need the pid and seq argument here? The sequence number is
redundant information anyway. In turn, you don't seem to pass the event
in to the notify_seq() callback, which could be handy in some cases. So
IMHO something like

  notify_seq(struct xfrm_state *x, int event)

would be more general.

> --- linux-2.6.13-rc3.org/net/key/af_key.c 2005-07-18 10:49:41.0 
> +0200
> +++ linux-2.6.13-rc3/net/key/af_key.c 2005-07-19 10:10:22.0 +0200
> @@ -2860,6 +2860,12 @@
>   return pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_REGISTERED, NULL);
>  }
>  
> +static int pfkey_send_replay_notify(struct xfrm_state *x, u32 pid, u32 seq)
> +{
> + /* FIXME: To be done*/
> + return 0;
> +}

  I also have a PF_KEY implementation of these features, but since we
have to define new message types to support all the features this is a
hard thing... (And consequently the code is more of a hack than correct
implementation.)

-- 
 Regards,
  Krisztian Kovacs

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html