On 10/8/09, Luigi Rizzo <ri...@iet.unipi.it> wrote:
> On Thu, Oct 08, 2009 at 12:54:52AM +0200, Luigi Rizzo wrote:
>> On Wed, Oct 07, 2009 at 12:46:24PM -0700, Joe R wrote:
>> > We at ironport have a requirement to do bandwidth management, but the
>> > traffic classification (and selection of bandwidth pipes) is done in
>> > userspace. The reason classification is done in userspace is because the
>> > traffic classifications are something like streaming audio traffic,
>> > video
>> > traffic, based on website categories etc.
>> >
>> >
>> >
>> > Our appliance is based on FreeBSD, and so we decided to look at dummynet
>> > to
>> > support our requirement. We could not use dummynet as such because it
>> > uses
>> > ipfw for packet classification, where packet classification (and pipe
>> > selection) is done in kernel based on tcp/ip parameters like IP and
>> > port.
>> >
>> >
>> >
>> > So we decided to extended dummynet/ipfw to support packet classification
>> > in
>> > userspace.
>> >
>> > Our idea is to extended socket structure to have a pipe number and have
>> > a
>> > setsockoption to associate the pipe number to a socket structure. Then
>> > have
>> > a new ipfw target (mappedpipe), which will pass the packet to dummynet
>> > (similar to pipe target) but with the pipe number in the socket
>> > structure if
>> > it is non-zero.
>> >
>> >
>> >
>> > I would like to know your comments on this proposal and if people are
>> > interested, I will be happy to submit a patch on this.
>>
>> i think the feature is useful. However I would implement it as an
>> ipfw 'option' called "sockarg" (or similar) as follows:
>>
>>      ipfw pipe tablearg sockarg
>>
>> where 'sockarg' succeeds ONLY if the packet is associated to a socket
>> for which the special setsockoption has been issued, and in this
>> case sets the 'tablearg' to the value of the setsockopt. This is
>> somewhat similar to the 'uid' and 'gid' options (except for setting
>> tablearg).  This way the mechanism can be very general (not limited
>> to pipes) and the implementation is probably
>> simpler than the one you propose.
>>
>> In terms of runtime costs, we can look at check_uidgid() function,
>> and there are two ways to implement this feature:
>> - as in check_uidgid() , actively lookup for a matching socket if one
>>   is not available. This is expensive but would allow the feature to
>>   match also incoming packets;
>> - only match if the args->inp parameter is non-null, otherwise do not
>>   call in_pcblookup_hash(). This is cheaper but clearly only works
>>   for locally generated packets.
>> Perhaps we could use an argument for 'sockarg' so we can decide
>> whether to call or not the in_pcblookup_hash() on a case-by-case
>> basis.
>
> To complete the analysis, I must say that I don't know how intrusive
> is the setsockopt that can attach a classification tag to the socket.
> This is my main concern for merging your proposal into the system
> (and i am only concerned about the socket part, the ipfw change is
> trivial).
>
> Also for completeness, there is also another possible approach to
> address your problem, which is more general and fully contained in
> ipfw (so less intrusive for the OS):
>
>   add a 'hashtable' structure to ipfw, which works in a way similar
>   to the 'table' with the difference that entries would be the whole
>   5-tuple of the packet.
>
> There is already a hash table in ipfw (used for dynamic rules) so
> it would be only a matter of adding the necessary glue to manipulate
> the hash table from /sbin/ipfw. An additional bonus of this approach
> is that one could use this new code to 'prime' the dynamic rule table
> after a reboot, which is a feature that people ask from time to time.
>
>       cheers
>       luigi
>

Hi,

I am attaching a patch taken against HEAD today which

implements the socket and ipfw sockarg option as discussed in the thread.

Applying this patch,

you can associate a pipe to the socket
using the setsocket option(in userspace)

and an ipfw rule similar to

ipfw add 100 pipe tablearg sockarg

will forward the traffic to the pipe associated
with the socket.

Please let me know your comments.

Regards,
Joe.
Index: src/sbin/ipfw/ipfw2.c
===================================================================
RCS file: /home/ncvs/src/sbin/ipfw/ipfw2.c,v
retrieving revision 1.159
diff -c -u -r1.159 ipfw2.c
--- src/sbin/ipfw/ipfw2.c	19 Apr 2010 16:35:47 -0000	1.159
+++ src/sbin/ipfw/ipfw2.c	8 Sep 2010 22:29:48 -0000
@@ -266,6 +266,7 @@
 	{ "estab",		TOK_ESTAB },
 	{ "established",	TOK_ESTAB },
 	{ "setup",		TOK_SETUP },
+	{ "sockarg",		TOK_SOCKARG },
 	{ "tcpdatalen",		TOK_TCPDATALEN },
 	{ "tcpflags",		TOK_TCPFLAGS },
 	{ "tcpflgs",		TOK_TCPFLAGS },
@@ -1338,6 +1339,9 @@
 			case O_FIB:
 				printf(" fib %u", cmd->arg1 );
 				break;
+			case O_SOCKARG:
+				printf(" sockarg");
+				break;
 
 			case O_IN:
 				printf(cmd->len & F_NOT ? " out" : " in");
@@ -3531,6 +3535,9 @@
 			fill_cmd(cmd, O_FIB, 0, strtoul(*av, NULL, 0));
 			av++;
 			break;
+		case TOK_SOCKARG:
+			fill_cmd(cmd, O_SOCKARG, 0, 0);
+			break;
 
 		case TOK_LOOKUP: {
 			ipfw_insn_u32 *c = (ipfw_insn_u32 *)cmd;
Index: src/sbin/ipfw/ipfw2.h
===================================================================
RCS file: /home/ncvs/src/sbin/ipfw/ipfw2.h,v
retrieving revision 1.13
diff -c -u -r1.13 ipfw2.h
--- src/sbin/ipfw/ipfw2.h	19 Apr 2010 15:11:45 -0000	1.13
+++ src/sbin/ipfw/ipfw2.h	8 Sep 2010 22:29:48 -0000
@@ -199,6 +199,7 @@
 	TOK_FIB,
 	TOK_SETFIB,
 	TOK_LOOKUP,
+	TOK_SOCKARG,
 };
 /*
  * the following macro returns an error message if we run out of
Index: src/sys/kern/uipc_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.349
diff -c -u -r1.349 uipc_socket.c
--- src/sys/kern/uipc_socket.c	7 Aug 2010 17:57:58 -0000	1.349
+++ src/sys/kern/uipc_socket.c	8 Sep 2010 22:29:54 -0000
@@ -123,6 +123,8 @@
 #include <sys/socketvar.h>
 #include <sys/resourcevar.h>
 #include <net/route.h>
+#include <netinet/in.h>
+#include <netinet/ip_var.h>
 #include <sys/signalvar.h>
 #include <sys/stat.h>
 #include <sys/sx.h>
@@ -2461,6 +2463,26 @@
 				so->so_fibnum = 0;
 			}
 			break;
+
+		case SO_SETBPIPE:
+			if(ip_dn_io_ptr == NULL){
+				error = ENOPROTOOPT;
+				goto bad;
+			}
+
+			error = sooptcopyin(sopt, &optval, sizeof optval,
+						sizeof optval);
+			printf("opt val is %d \n", optval);
+			if (optval < 0 || error ){
+				error= EINVAL; 
+				goto bad;
+			}
+	
+			if(so->so_proto->pr_domain->dom_family == PF_INET) 
+				so->so_pipenum = optval;
+			
+			break;
+
 		case SO_SNDBUF:
 		case SO_RCVBUF:
 		case SO_SNDLOWAT:
Index: src/sys/netinet/ip_fw.h
===================================================================
RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v
retrieving revision 1.138
diff -c -u -r1.138 ip_fw.h
--- src/sys/netinet/ip_fw.h	15 Mar 2010 17:14:27 -0000	1.138
+++ src/sys/netinet/ip_fw.h	8 Sep 2010 22:29:58 -0000
@@ -192,10 +192,13 @@
 
 	O_SETFIB,		/* arg1=FIB number */
 	O_FIB,			/* arg1=FIB desired fib number */
+	
+	O_SOCKARG,		/* socket argument */
 
 	O_LAST_OPCODE		/* not an opcode!		*/
 };
 
+
 /*
  * The extension header are filtered only for presence using a bit
  * vector with a flag for each header.
Index: src/sys/netinet/ipfw/ip_fw2.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/ipfw/ip_fw2.c,v
retrieving revision 1.45
diff -c -u -r1.45 ip_fw2.c
--- src/sys/netinet/ipfw/ip_fw2.c	27 Jul 2010 14:26:34 -0000	1.45
+++ src/sys/netinet/ipfw/ip_fw2.c	8 Sep 2010 22:30:05 -0000
@@ -1801,6 +1801,39 @@
 					match = 1;
 				break;
 
+			case O_SOCKARG:	{
+				struct inpcb *inp = args->inp;
+				struct inpcbinfo *pi;
+				
+				if(is_ipv6)
+					break;
+
+				if(proto == IPPROTO_TCP)
+					pi = &V_tcbinfo;
+				else if (proto == IPPROTO_UDP)
+					pi = &V_udbinfo;
+				else
+					break;
+
+				/* For incomming packet, lookup up the 
+				inpcb using the src/dest ip/port tuple */
+				if(inp == NULL) {
+					INP_INFO_RLOCK(pi);
+					inp = in_pcblookup_hash(pi, 
+						src_ip, htons(src_port),
+						dst_ip, htons(dst_port),
+						0, NULL);
+					INP_INFO_RUNLOCK(pi);
+				}
+				
+				if(inp && inp->inp_socket) {
+					tablearg = inp->inp_socket->so_pipenum;
+					if(tablearg)
+						match = 1;
+				}
+				break;
+			}
+
 			case O_TAGGED: {
 				struct m_tag *mtag;
 				uint32_t tag = (cmd->arg1 == IP_FW_TABLEARG) ?
Index: src/sys/netinet/ipfw/ip_fw_sockopt.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/ipfw/ip_fw_sockopt.c,v
retrieving revision 1.17
diff -c -u -r1.17 ip_fw_sockopt.c
--- src/sys/netinet/ipfw/ip_fw_sockopt.c	7 Apr 2010 08:23:58 -0000	1.17
+++ src/sys/netinet/ipfw/ip_fw_sockopt.c	8 Sep 2010 22:30:06 -0000
@@ -572,6 +572,7 @@
 		case O_IPTOS:
 		case O_IPPRECEDENCE:
 		case O_IPVER:
+		case O_SOCKARG:
 		case O_TCPWIN:
 		case O_TCPFLAGS:
 		case O_TCPOPTS:
Index: src/sys/sys/socket.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/socket.h,v
retrieving revision 1.105
diff -c -u -r1.105 socket.h
--- src/sys/sys/socket.h	9 Jan 2010 23:24:49 -0000	1.105
+++ src/sys/sys/socket.h	8 Sep 2010 22:30:07 -0000
@@ -137,6 +137,7 @@
 #define	SO_LISTENQLEN	0x1012		/* socket's complete queue length */
 #define	SO_LISTENINCQLEN	0x1013	/* socket's incomplete queue length */
 #define	SO_SETFIB	0x1014		/* use this FIB to route */
+#define SO_SETBPIPE	0x1015		/* use this pipe to throttle */
 #endif
 
 /*
Index: src/sys/sys/socketvar.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/socketvar.h,v
retrieving revision 1.173
diff -c -u -r1.173 socketvar.h
--- src/sys/sys/socketvar.h	18 Jul 2010 20:57:53 -0000	1.173
+++ src/sys/sys/socketvar.h	8 Sep 2010 22:30:07 -0000
@@ -118,6 +118,7 @@
 		char	*so_accept_filter_str;	/* saved user args */
 	} *so_accf;
 	int so_fibnum;		/* routing domain for this socket */
+	int so_pipenum;
 };
 
 /*
_______________________________________________
freebsd-ipfw@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw
To unsubscribe, send any mail to "freebsd-ipfw-unsubscr...@freebsd.org"

Reply via email to