Folks, Attached is a document that should help people wishing to use generic netlink interface. It is a WIP so a lot more to go if i see interest. The doc has been around for a while, i spent part of yesterday and this morning cleaning it up. If you have sent me comments before, please forgive me for having misplaced them - just send again.
cheers, jamal PS:- I dont have a good place to put this doc and point to, hence the 17K attachment
1.0 Problem Statement ----------------------- Netlink is a robust wire-format IPC typically used for kernel-user communication although could also be used to be a communication carrier between user-user and kernel-kernel. A typical netlink connection setup is of the form: netlink_socket = socket(PF_NETLINK, socket_type, netlink_family); where netlink_family selects the netlink "bus" to communicate on. Example of a family would be NETLINK_ROUTE which is 0x0 or NETLINK_XFRM which is 0x6. [Refer to RFC 3549 for a high level view and look at include/linux/netlink.h for some of the allocated families]. Over the years, due to its robust design, netlink has become very popular. This has resulted in the danger of running out of family numbers to issue. In netconf 2005 in Montreal it was decided to find ways to work around the allocation challenge and as a result NETLINK_GENERIC "bus" was born. This document gives a mid-level view if NETLINK_GENERIC and how to use it. The reader does not necessarily have to know what netlink is, but needs to know at least the encapsulation used - which is described in the next section. There are some implicit assumptions about what netlink is or what structures like TLVs are etc. I apologize i dont have much time to give a tutorial - invite me to some odd conference and i will be forced to do better than this doc. Better send patches to this doc. 2.0 High Level view -------------------- In order to illustrate the way different components talk to each other, the diagram below is used to provide an abstraction on how the operations happen. There are two (three depending on your perspective) components: 1) The generic netlink connection which for illustration is refered to as a "bus". The generic netlink bus is shown as split between user and kernel domains: This means programs can connect to the bus from either kernel or user space. 2) components that talk to each other after attaching to the bus. a) Two users are shown in user spaces b)3 in the kernel. All boxes have kernel-wide unique identifiers that can be used to address them. Typicaly, user space boxes exist to control one or more kernel level boxen i.e they update some attributes that exist in a kernel level box. Any of these "boxes" can communicate to each other by first connecting to the bus and then sending messages addressed to any box. +----------+ +----------+ | user1 | ...... | user-n | +--+-------+ +-------+--+ | | / | | | User +---------+------------------------+---------+ Space/domain user | | --------+ Generic Netlink Bus +----------- kernel | | Kernel +------------------+------------------+------+ Space/domain | | | | | | | | | | | | +--+-------+ +---+-----+ +------+-+ |controller| | foobar | | googah | +----------+ +---------+ +--------+ The controller is a speacial built-in user of the bus. It is the repository of info on kernel components that have attached to the bus. It has a reserved address identifier of 0x10. By querying the controller, one could find out that both foobar and googah are registered and what their IDs are etc. Essentially its a namespace translator not unlike DNS is for IP addresses. More later on this. To get to the point of the most common usage of netlink (user space control of a kernel component), the diagram below breaks things down for a single user program that controls a kernel module called foobar. The example is simple for illustration purposes; as an example, user space could control a lot more kernel modules. +----------------------+ | | | user program | gnl events ; ->-->| | (2) ,-/ +--^-----+----------^--+ ,' gnl | ^ foobar ^ foobar ,' discovery ^ | events | config/query ,' (1) | ^ (4) ^ (3) +--/-------------- +>------|----------|-------------+ | / / \ \ | +----------------+----------+<+--------\------------+ | / \ | ^ / \ Y \ Y \ | \ Y ^ | ++------- '-+ +|-----Y-----+ | controller| | foobar | +-----------+ +------------+ #1: The user space could start by discovering the existence of foobar by doing a dump of all existing modules or doing a specific query by name. At that point it knows the ID of foobar. #2: The user space could subscribe to listen to events of newly appearing kernel modules or departure of existing ones. #3: The user space could configure foobar or do queries on existing state #4: The user space program could subscribe to listen to events on foobar. Note these events are upto the programmer of foobar. Typical events could be things like modifications of attributes (example by other user space programs), or creation, or deletion of attributes etc. Events (#2, #4) are by definition asynchronous and unidirectional as shown while configuration and querying (#1, #3) are synchronous query-response operations. 2.1 Kernel < --> User space Communication. ----------------------------------------- Essentially nothing new, Communication is as in standard netlink approach. i.e from user space you open a netlink socket to the kernel - in this case family NETLINK_GENERIC - and send and receive response as well as asynchronous events. To receive to events you subscribe to specific multicast groups. You really should use libnetlink or libnl to simplify your life in user space. 2.2 Kernel < --> User space encapsulation. -------------------------------------- Between user space and the kernel, the message passed around looks as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | nlmsghdr | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Generic message header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optional user specific message header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional user specific TLVs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2.2.1 nlmsghdr -------------- The nlmsghdr is the standard one as in: struct nlmsghdr { __u32 nlmsg_len; /* Length including header */ __u16 nlmsg_type; /* Message content */ __u16 nlmsg_flags; /* Additional flags */ __u32 nlmsg_seq; /* Sequence number */ __u32 nlmsg_pid; /* Sending process PID */ }; The address of a specific kernel module is carried in nlmsg_type. The rest of the parts of the netlink header are used exactly the same as in current netlink (refer to RFC 3549) 2.2.2 Generic message header ---------------------------- The user specific header looks as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | command | version | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ command is an 8 bit field that your kernel/user code understands. Typical commands are things that get/delete/add/dumping of attributes or vectors of attributes. It is defined like so in C-speak: struct genlmsghdr { __u8 cmd; __u8 version; __u16 reserved; }; A get passed with a netlink flag NLMSG_F_DUMP is understood to be requesting for a dumper. 2.2.3 optional user specific message header --------------------------------------------- One could add the extra fields preferable to be multiples of 32 bits as: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ ~ ~ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The kernel module needs to understand the extra header. Under typical circumstances this extension header doesnt exist. 2.2.4 Optional user specific TLVs ---------------------------------- The user specific header is followed typically by a list of optional attributes in the form of TLV structures. The example we have below has a few TLVs for illustration The attributes carry all the data that needs to be exchanged. This enforces a structured formating. Messages can of course be batched as long as the socket buffers allow it. 3.0 Kernel point of view ------------------------ Inside the kernel, the code wishing to commumicate using netlink registers its presence by using the structre genl_type which looks as follows: struct genl_family { unsigned int id; unsigned int hdrsize; char name[GENL_NAMSIZ]; unsigned int version; unsigned int maxattr; struct module * owner; struct nlattr ** attrbuf; /* private */ struct list_head ops_list; /* private */ struct list_head family_list; /* private */ }; - id is the field which is used in the nlmsg_type of the netlink header. Messages matching this id which are known to belong to you are multiplexed to your specific registered handlers (more below). Ids cannot be below 0x10 and cannot exceed 0xFFFF. 0x10 is reserved for the controller. IDs are unique system wide. - hdrsize is the size in bytes of your msgheader that follows the netlink header but before the TLVs. If you have no specific messages header, this should be 0. - name is a the string identifier you wish to be refered to. names also have to be unique. -version is whatever version for your own maintainance. The core code doesnt interpret it. - maxattr is the maximum number of attributes (TLVs) you expect to see. You can own upto 2^16 bits of types, the danger is memory is allocated to hold attributes; so use with care. Typically you shouldnt have more than 10-30 types of messages you pass around. Keep reading on to see the examples of what this is. You probably shouldnt touch the other fields. 3.1 Kernel level Example of registering a component ---------------------------------------------------- First lets talk about registering a component foobar so that it is visible at the controller. We then talk about adding support for some simple commands which can be sent to it via user space. 3.1.1 Adding foobar ------------------ //Your static Id // #define GENL_ID_FOOBAR 0x123 // all commands you want to process // typicall 0 is reserved enum { FOOBAR_CMD_UNSPEC, FOOBAR_CMD_NEWTYPE, FOOBAR_CMD_DELTYPE, FOOBAR_CMD_GETTYPE, FOOBAR_CMD_NEWOPS, FOOBAR_CMD_DELOPS, FOOBAR_CMD_GETOPS, /* add future commands here */ __FOOBAR_CMD_MAX, }; #define FOOBAR_CMD_MAX (__FOOBAR_CMD_MAX - 1) // the attributes you want to own enum { FOOBAR_ATTR_UNSPEC, FOOBAR_ATTR_TYPE, FOOBAR_ATTR_TYPEID, FOOBAR_ATTR_TYPENAME, FOOBAR_ATTR_OPER, /* add future attributes here */ __FOOBAR_ATTR_MAX, }; #define FOOBAR_ATTR_MAX (__FOOBAR_ATTR_MAX - 1) static struct genl_type foobar = { .id = GENL_ID_FOOBAR, .name = "foobar", .version = 0x1, .hdrsize = sizeof(struct mymsghdr), .maxattr = FOOBAR_ATTR_MAX, }; So then you register yourself to receive these messages .. Note: Your static id GENL_ID_FOOBAR is _not_ guaranteed to be allocated to you. This is so because the system guarantees uniqueness. If some other code has registered already for that ID - it will be too late. You can however get a dynamically allocated ID by passing GENL_ID_GENERATE(0x0) as the ID. In the dynamic case when the registration succeeds you get a your .id set to whatever the system allocated. The user space part can discover this id by querying the controller for your name. err = genl_register_family(&foobar); the registration could fail and return you the following: 1) -EINVAL if you do any of the following: a) have an ID that is less than GENL_MIN_TYPE b) pass a hdrsize that is either not a multiple of 4 bytes or is less than the minimal mandated size of 4 bytes 2)-EEXIST if your name or id is already registered 3) -ENOMEM if: a) you passed GENL_ID_GENERATE and there are no more IDs left b) the core failed to allocate memory for your .attrbuf. 4) -EBUSY if there are issues loading the module. on success of registration you get a 0 returned. You MUST unregister if you are going to exit since some memmory is allocated. You do this via: genl_unregister_family(&foobar); 3.1.2 Adding foobar commands ----------------------------- Next we need to register commands that will be processed by your ID. There are two classes of commands: a) A dumper that looks like: int (*dumpit)(struct sk_buff *skb, struct netlink_callback *cb); This callback is invoked when user space calls you with the NLMSG_F_DUMP flag. You are passed a skb which you fill in with the data you need to dump. There is a netlink_callback that you use to store state so you can continue dumping afterwards. As long as you return > 0 - the system will continue to call you with skbs where you can stash more data. Typically the trick is you should return skb->len. When you have nothing left to add skb->len will be 0. More later. b) a callback for all other commands. int (*doit)(struct sk_buff *skb, struct genl_info *info); where struct genl_info is: struct genl_info { u32 snd_seq; u32 snd_pid; struct nlmsghdr * nlhdr; struct genlmsghdr * genlhdr; void * userhdr; struct nlattr ** attrs; }; The system will call you with an skb where the message for you is stored; the nlmsghdr pointer so right at the begining of the message. the genlhdr is the generic message header mentioned earlier. If you have a message header, this will passed to you pointed by userhdr. If your messaging uses TLVs, they will be pointed to by attrs. and you can process them by indexing by type into attrs. More later. You should return a 0 on success and a meaningful error code < 0 on failure. Ok, so how do you register your command? Use structure genl_ops which looks like: struct genl_ops { unsigned int cmd; unsigned int flags; struct nla_policy *policy; int (*doit)(struct sk_buff *skb, struct genl_info *info); int (*dumpit)(struct sk_buff *skb, struct netlink_callback *cb); struct list_head ops_list; }; - command is the cmd identifier. - flags are descriptors for the command. - policy is used further to validate attributes. - doit and dumpit have been discussed above. To register for the dumper, you must pass GENL_DUMP_CMD in the flags. Dumper Example: static int foobar_dump(struct sk_buff *skb, struct netlink_callback *cb) { return 0; } static struct genl_ops foobar_dump = { .cmd = FOOBAR_CMD_GETTYPE, .flags = GENL_DUMP_CMD, .dump = foobar_dump, }; err = genl_register_ops(&foobar, &foobar_dump); err will be -EINVAL if foobar is not registered yet or if you pass a NULL for foobar_dump. -EEXIST is returned if the command is found to already have been registered. and example for the standard interface: static int foobar_do(struct sk_buff *skb, struct genl_info *info) { return 0; } Lets register for it to be invoked everytime the command FOOBAR_CMD_GETTYPE is passed from user space. static struct genl_ops foobar_do = { .cmd = FOOBAR_CMD_GETTYPE, .doit = foobar_do, }; err = genl_register_ops(&foobar, &foobar_do); TODO: a) Add a more complete compiling kernel module with events. Have Thomas put his Mashimaro example and point to it. b) Describe some details on how user space -> kernel works probably using libnl?? c) Describe discovery using the controller.. d) talk about policies etc e) talk about how something coming from user space eventually gets to you. f) Talk about the TLV manipulation stuff from Thomas. g) submit controller patch to iproute2