I'm restarting the timer for this case.

The updated diff-marked specification is attached below.

Darren
--------------------------------------------------------------

Abstract
========
This case will extend PSARC/2005/334, by adding the ability to intercept
packets in MAC layer using the PFHooks infrastructure.

| This case only makes a few changes, an addition, to the interfaces that were
| committed to by PSARC/2008/219 (see "net_getlifaddr", "hook_pkt_event_t",
| and "new NIC event" below for details)

Release Biding
--------------
This case seeks for a patch binding.

Introduction
============
The PFHooks project, PSARC/2005/334, provide the ability to intercept packets
in IP layer by adding hooks into the network stack. 

Since its integration, there has been customer requirements for the ability
to intercept packets in MAC layer as well, also it is needed to enforce
security rules for xVM guest domains and exclusive zones.

Goals
-----
This case seeks to meet the following goals:
* provide the hooks in MAC layer that allows consumers to register on to 
  intercept packets;

* provide the netinfo interface for MAC layer that gives consumers access to
  interface information, and the ability to inject or emit packets directly;

* modify IPFilter to allow the administrator to specify layer 2 rules, which
  includes ethernet filtering rules and IP Filtering/NAT rules.

Out of scope
------------
This project only provides the ability to specify ethernet filtering rules
to match ethernet packets, and IP filtering/NAT rules to match/modify IP
packets at MAC layer. Providing the ability to specify rules to filter 
non-ethernet packets by matching the MAC header is out of scope for this
project.

The detailed design are described below for each major components.

Netinfo interface for MAC layer
===============================
netinfo interfaces
------------------
The hooks provided will generate events for NH_PHYSICAL_IN and NH_PHYSICAL_OUT,
using the same interface as IPv4 and IPv6 do in PSARC/2005/334.

The following functions will be supported through the netinfo(9f) framework:
net_getifname()
net_phylookup()
net_phygetnext()
net_getlifaddr()
net_inject()
net_getmtu()

All of the other functions in the netinfo(9f) framework will return a value
indicating that they are unsupported. The return values for the above 
functions only have meaning with the scope of the corresponding family - 
it is not correct to use a value returned by net_getifname() using the
ethernet net_data_t handle with net_phylookup() for IPv4.

The callback for these events will receive a pointer to a hook_pkt_event_t
structure that has the following fields filled out:

hpe_ifp - 0 for NH_PHYSICAL_OUT, otherwise a value indicating which
          interface the NH_PHYSICAL_IN event is associated with;
hpe_ofp - 0 for NH_PHYSICAL_IN, otherwise a value indicating which
          interface the NH_PHYSICAL_OUT event is associated with;
hpe_hdr - points to the start of the MAC header
hpe_mb  - points to the start of the mblk_t that holds hpe_hdr;
hpe_mp  - points to the mblk_t that is the start of the packet.
hpe_hpeinfo - points to mac_header_info_t which contains MAC header information

| net_getlifaddr
| --------------
| The net_getlifaddr() function returns the address for a given interface.
| For existing IP netinfo it returns the IP address, and for MAC layer netinfo
| it returns MAC address for an interface, the the usage of this function
| is slightly different in the two situations.
| 
| int net_getlifaddr(const net_data_t net, const phy_if_t ifp,
|     const net_if_t lif,  int const type, struct sockaddr *storage);
| 
|     net
|          value returned from a successful call to net_protocol_lookup.
| 
|     ifp
|          value returned from a successful call to net_phylookup
|          or net_phygetnext, indicating which network interface
|          the information should be returned from.
|
|     lif
|          indicating which logical interface to fetch the address from.
| 
|     type
|          this indicates what type of address should be returned.
| 
|     storage
|          pointer to an area of memory to store the address data.
| 
| This case introduces a slightly different usage for this function
| when used to retrieve MAC layer information. Unlike IP, MAC doesn't
| have the concept of logical interface, so the caller should pass in
| the physical interface as ifp, and pass in a 0 as the lif because
| there is no valid lif for MAC.
| 
| Each call to net_getlifaddr requires that the caller pass in
| a pointer to an array of address information types to retrieve
| and an accompanying pointer to an array of pointers to struct
| sockaddr_dl structures in which to copy the address information
| into. See below for an example of how to use this function.
| 
| Each member of the address type array should be one of NA_ADDRESS,
| NA_PEER, NA_BROADCAST or NA_NETMASK, and it is up to each layer 2
| protocol to implement the address type. For Ethernet, NA_ADDRESS
| and NA_BROADCAST are supported, and NA_BROADCAST always return
| ff:ff:ff:ff:ff:ff.
| 
hook_pkt_event_t
----------------
In order to intercept IP packets at MAC layer, IPFilter needs to know the 
size of the MAC header to locate the IP header start. The problems is the wifi
header is not self explained, parsing it requires information from mac handle
thus IPFilter cannot do the parsing itself, so we need to rely on the MAC
plugin to parse the header, pass the information through Hook framework to
IPFilter, so it can identify IP header start correctly.

While adding a header length field to hook_pkt_info_t solves the problem above,
down the road we may want to provide the ability to match wifi header, which
requires information of the wifi header fields in IPFilter, not just the header
| length, thus we propose to add a pointer to hook_pkt_event_t, which points at
a structure of mac_header_info_t, and pass this through the Hook framework,
so the hook consumers, like IPFilter, can have the needed information for
| the MAC header. The new hook_pkt_event_t strucuture would look like:

typedef struct hook_pkt_event {
        net_handle_t            hpe_protocol;
        phy_if_t                hpe_ifp;
        phy_if_t                hpe_ofp;
        void                    *hpe_hdr;
        mblk_t                  **hpe_mp;
        mblk_t                  *hpe_mb;
        int                     hpe_flags;
-       void                    *hpe_reserved[2];
+       void                    *hpe_hdrinfo;
+       void                    *hpe_reserved[1];
} hook_pkt_event_t;

For existing IP/ARP Hooks, the header format is self explained, so hpe_hdrinfo
will be NULL and IPFilter does the header parsing itself as before.

| MAC client index
| ----------------
| L2 filtering is based on MAC client which is introduced by Crossbow project,
| and the filtering is done on a per MAC client basis. When users specify a
| link name "net0", this corresponds to the traffic going through the primary
| MAC client of net0, e.g. IP on top of that data link. 
| 
| The MAC client index is introduced in this project, which uniquely identifies
| a MAC client and is used by the layer 2 netinfo interface in the same way
| as the ifindex is used by the IP netinfo interface. And layer 2 netinfo
| provides the mapping between data link name and index of the primary MAC
| client of that data link, through net_getifname() and net_phylookup().

new NIC event
-------------
The status of network in the operating system often changes, from unplugging
a system from network temporarily, to an interface's IP address changing
as a result of DHCP. Thus PFHooks framework provides event notification
mechanism for this.

The callback for these events will receive a pointer to a hook_nic_event_t
structure that has the following fields filled out:

hne_protocol - network protocol for events, returned from net_lookup

hne_nic      - physical interface associated with event

hne_lif      - logical interface (if any) associated with event

hne_event    - type of event occuring. The current list of events available is:

        NE_PLUMB
               indicates that an interface has just been created

        NE_UNPLUMB
               indicates that an interface has just been destroyed and that
               no more events should be received for it

        NE_UP
               indicates that an interface has changed state to "up" and 
               may now generate packet events.

        NE_DOWN
               indicates that an interface has changed state to "down" and
               will no longer generate packet events.

        NE_ADDRESS_CHANGE
               indicates that an address on an interface has changed.

hne_data     - pointer to extra data about event or NULL if none

hne_datalen  - size of data pointed to by hne_data (can be 0)

NE_NAME_CHANGE event
~~~~~~~~~~~~~~~~~~~~
As Clearview UV (PSARC/2006/499, PSARC/2007/527, PSARC/2008/002) introduces
the ability to rename a data link, we need to capture this event in order to
update IPFilter rules accrodingly. Thus we propose an extension to
PSARC/2008/219 by adding a new hook event NE_NAME_CHANGE to nic_event_t
to indicate the that an interface has been renamed, and this particular event
is only available to layer 2 netinfo. In IP, changing of an interface name
is represented by a NE_UNPLUMB and NE_PLUMB event pair.

typedef enum nic_event {
         NE_PLUMB = 1,
         NE_UNPLUMB,
         NE_UP,
         NE_DOWN,
         NE_ADDRESS_CHANGE,
+        NE_NAME_CHANGE
} nic_event_t;

Design considerations
~~~~~~~~~~~~~~~~~~~~~
IPFilter rules always match by name, and only the current link names are used
for matching, not old names. Uppon NE_NAME_CHANGE event, IPFilter will walk
all the layer 2 rules, and resolve the interface name stored in the rule
structure into interface pointers. So when the link is renamed, rules using
old link names are invalidated, and rules using new link names are activated.
If there's a filtering rule that applies to interface bge0, and someone renames
bge0 to net0, then the rule no longer matches packets received on the link
formally known as bge0.

Also IPFilter has been designed to allow users to specify rules with interface
names that do not exist at the time they are loaded, and for those interface
names to be resolved at the time at which they're added to the system. Thus,
the mapping from the linkname to the linkid needs to happen in the kernel.
Changing IPFilter to use linkid instead of link name will not work.

Protocol & Hook registration
============================
Protocol registration
---------------------
With IP layer netinfo today we have 3 protocols, IPv4, IPv6 and ARP. For MAC
layer, each of the MAC plugin type is treated as a different protocol, so 
we'll have ethernet, wifi and ib. These protocols will be registered by
using net_protocol_register() when the corresponding MAC plugin gets loaded.

Hook registration
-----------------
IPFilter will register hooks for MAC layer protocols in the following cases:

when the first ethernet filtering rule is added
- register the ethernet hook

when the first "layer2" IP filtering/NAT rule is added
- register the ethernet, wifi and ib hooks

when it receives a notification indicating that a protocol is registered
- register the hook if there are rules for that corresponding protocol.

Since layer 2 filtering functionality is enabled automatically when the 
first layer 2 rule is added, the corresponding hook needs to be registered
then so packets can be passed to IPFilter from the hook framework.

It is possible that a rule for a layer 2 protocol is added before the
corresponding protocol is registered. Suppose user has added a layer 2 IP
filtering rule on a system that only has ethernet cards, then he plugs a
wifi card into the system and sets it up, in this case when the wifi MAC
plugin is loaded, the protocol will be registered, and IPFilter will be
notified via the callback notification mechanism provided by the PFHooks
API project, and it will register the hook for that protocol so it can
receive and match wifi packets.

| Dynamic data path modification
| ------------------------------
| To make sure layer 2 filtering has no performance impact when disabled,
| instead of inserting hooks check into the fast path code, we make use of
| the function pointer driven approach provided by Crossbow where possible.
| On RX side layer 2 filter implements its own receive function, and will
| replace the default function with its own one when l2 filtering is enabled.
| The l2 filter specfic receive function does layer 2 firewall processing
| before calling the original receive function. So when l2 filter is disabled
| there's zero additional processing on the RX path. On TX side layer 2 filter
| will force packets off the fast path when filtering is enabled, and add
| the hooks check into the non fast path to avoid impacting performance.
| In both cases the data path will be modified dynamically when the filtering
| is enabled/disabled, and this is done on a per MAC client basis.
| 
| To do this a function will need to be called when the first hook is registered
| on a specific hook event, and when the last hook is unregistered from the
| event, to do the the necessary data path setup. The hook_event_t strcture is
| changed to accomodate this so that hook providers, MAC plugins in this case,
| could specify their own callbacks which will be called from hooks_register/
| hook_unregister().
| 
| typedef struct hook_event_s {
|          int             he_version;
|          char            *he_name;       /* name of this hook list */
|          int             he_flags;       /* 1 = multiple entries allowed */
|          boolean_t       he_interested;  /* true if callback exist */
| +        void            (*he_enable_cb)(hook_event_token_t, hook_event_t *,
| +                            void *);
| +        void            (*he_disable_cb)(hook_event_token_t, hook_event_t *,
| +                            void *);
| +        void            *he_arg_cb;
| } hook_event_t;
| 
| he_arg_cb points at a mactype_t structure, to identify which MAC plugin the
| hook is registered on, as l2 filtering is enabled/disabled per MAC plugin.
| The two callback functions, pointed by he_enable_cb and he_disable_cb, will
| walk through the MAC clients in the system, and does the necessary data path
| setup/cleanup for the corresponding MAC clients, which are primary MAC clients
| on top of data links of the specific MAC plugin.
|
| Relative Hooks ordering
| =======================
| Order with Bridging
| -------------------
| L2 filtering is done on a per MAC client basis. When the users specify "net0",
| this refers to the traffic going through the primary MAC client of net0, for
| example IP on top of that data link. This is different from all traffic going
| through the physical MAC instance which is shared by multiple MAC clients.
| And L2 hooks intercept traffic both from/to the wire, and those occur between
| multiple MAC clients defined on top of the same data link.
| 
| With regard to bridging, L2 filter works on top of the bridge, instead of
| underneath it, as the filtering is based on MAC clients instead of the MAC
| instance that the bridge uses. This means in certain cases L2 hooks is not
| able to see the actual interface used for transmit or receive, but only the
| interface that the network layer believes it's using, as when IP sends a
| packet on one interface, the bridge may end up transmitting that packet
| on another interface - if that's the interface on which the destination
| exists or if the destination is unknown. This is by design as l2 filtering
| aims more on controling what packets a VM can send to the wire via the data
| link it is using, insted of which physical link the packets actually get sent
| out from.
| 
| Order with bandwidth limit
| --------------------------
| L2 filter sits underneath bandwidth shaping by Crossbow. On RX side, the
| filtering is done before the bandwidth limit is applied; on TX side, it
| is applied after the bandwidth limit. 

IPFilter changes
================
Users can use ipf(1M) to add ethernet filtering rules in addition to IP 
filtering rules, these ethernet filtering rules are marked with "family ether".
They can also add IP Filtering/NAT rules and mark them with "layer2" keyword
so these rules will be processed in MAC layer instead of IP layer. Unlike IPv6,
no special command line switch is required to load these rules.

The "layer2" IP filtering/NAT rules go to existing ipf.conf, ipf6.conf and
ipnat.conf, respectively. The "family ether" rules go to a new configuration
file ipf-ether.conf.

The layer 2 filtering functionality will be enabled automatically when the
first ethernet rule or "layer2" IPFilter rule is added, and disabled when
the last such rule is removed. This functionality is only available in
global zone.

Also, ipmon has been updated to print out log records with ethernet
information but the output of this command is volatile.

Rule processing 
---------------
Currently processing order in IPFilter is:

[INPUT] -> IP NAT -> IP firewall -> { IP }  -> IP firewall -> IP NAT -> [OUTPUT]

With layer 2 filtering the processing order would become:

| [INPUT] -> L2 firewall -> "layer2" IP NAT -> "layer2" IP firewall ->
| ... -> IP NAT -> IP firewall -> { IP }  -> IP firewall -> IP NAT -> ...
| -> "layer2" IP firewall -> "layer2" IP NAT -> L2 firewall -> [OUTPUT]

Input processing
~~~~~~~~~~~~~~~~
Take input processing for an IP packet for example:

- MAC level filtering rules are processed first. These rules match on MAC
headers to determine if a packet should be passed or blocked. Administrators
use these rules to match with MAC addresses, MAC type, VLAN ID, .etc.

- L2filter jump over the MAC header, determine if this is an IP packet, and
do some sanity checking before passing it up to "layer2" IP rules for further
processing.

- Then "layer2" IP NAT rules are processed. Like IP layer NAT rules, these
rules do NAT for IP packets, but it is done at MAC layer instead of IP layer.

- Then "layer2" IP Filtering rules are processed. These rules provide IP 
Filtering at MAC layer.

- L2filter finishes processing and the packet is delivered up in the stack.
When the packet reaches IP, IP layer filtering/NAT processing is invoked,
and it works just as it does today.

Changes to output
-----------------
With layer 2 filtering, each type of rules have its own distinct orders,
the output of ipfstat/ipnat has been modified so that the rules are shown
in a manner to let the users better understand the processing orders. 
The change only applies to global zone, output in non-global zones remain
unchanged.

Example
~~~~~~~

# ipfstat -io
Ethernet rules:
empty list for ipfilter(out)
pass in family ether all
pass in family ether from 1:2:3:4:5:6 to any

layer 2 IP rules:
empty list for ipfilter(out)
pass in proto icmp from 1.1.1.1 to 2.2.2.2 layer2
block in proto tcp from 3.3.3.3 to 4.4.4.4 layer2

IP rules:
pass in all
pass out all

# ipnat -l
List of layer 2 active MAP/Redirect filters:
map bge1 from 2.3.4.5/32 to 6.7.8.9/32 -> 1.1.2.2/32 layer2

List of active MAP/Redirect filters:

List of active sessions:

Examples
--------
Prevent MAC address spoofing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppose we have a domU with a interface vnic0, we may want to ensure:
- packets from this domU can have use its own source MAC address, preventing
this domU from pretending someone else
- packet from this source MAC address can only come from this domU, preventing
others from pretending this domU

say vnic0 has MAC address 11:22:33:44:55:66, the rules would be something like:

block out family ether from 11:22:33:44:55:66 to any
block out on vnic0 family ether from any to any
pass out on vnic0 family ether from 11:22:33:44:55:66 to any

Prevent IP address spoofing
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppose we'd want to prevent a domU from using others' IP addresses, we can
probably go with:

block out on vnic0 from any to any layer2
pass out on vnic0 from 1.1.1.1 to any layer2

while 1.1.1.1 is the assigned IP address on vnic0

VLAN packets filtering
~~~~~~~~~~~~~~~~~~~~~~

Say we'd want to block some IP traffic, below are examples on how it is done
with regard to vlan:

- block all IP traffic regardless of VLAN

block in family ether type 0x800 

- block all IP traffic belonging to VLAN:

block in family ether type 0x800 with vlan

- block all IP traffic NOT belonging to VLAN:

block in family ether type 0x800 with not vlan 

- block all IP traffic for a specific VLAN (e.g. 100)

block in family ether type 0x800 vlan 100

Ioctl compatibility
-------------------
ABI compatibility with the old structure definitions is preserved by this case.

IPFILTER_VERSION (see ipnat(7i)) is used to keep track of user application's
version thus the old binaries can still work after this change. The kernel code
would handle the ioctl input/output based on the version number to make it a
compatible change. There's no change required for user applications using the
interfaces.

The related data structures natlookup_t and nat_t remain the same, and ioctls
SIOCGNATL/SIOCSTPUT will work correctly. User can set a flag, IPN_LAYER2,
in natlookup_t and nat_t, respectively, to indicate it is looking up/inserting
a layer 2 NAT session, or a layer 3 one. For compatibilities, by default the
flag is not set, which indicates a layer 3 session.


Interfaces
==========
+----------------------------------------+-------------------+
| Interface                              |  Classification   |
+----------------------------------------+-------------------+
| NE_NAME_CHANGE                         |     Committed     |
| NHF_ETHER                              |     Committed     |
| NHF_WIFI                               |     Committed     |
| NHF_IB                                 |     Committed     |
| | <sys/hook.h>                         |     Committed     |          
| | <sys/hook_event.h>                   |     Committed     |          
| | <sys/neti.h>                         |     Committed     |          
+----------------------------------------+-------------------+
| "ipfilter_hook_eth_in"                 |    Uncommitted    |
| "ipfilter_hook_eth_out"                |    Uncommitted    |
| "ipfilter_hook_wifi_in"                |    Uncommitted    |
| "ipfilter_hook_wifi_out"               |    Uncommitted    |
| "ipfilter_hook_ib_in"                  |    Uncommitted    |
| "ipfilter_hook_ib_out"                 |    Uncommitted    |
+----------------------------------------+-------------------+
| "family ether"                         |      Committed    |
| "layer2"                               | Obsolete Volatile |
+----------------------------------------+-------------------+
| IPN_LAYER2                             |      Volatile     |
| /usr/include/netinet/ip_fil.h          |    Uncommitted    |
| /usr/include/netinet/ip_nat.h          |    Uncommitted    |
+----------------------------------------+-------------------+


Reply via email to