Dropping one packet in a coalesced skb
Hi, I have LSO enabled on my machine, and hence a single skb can (possibly) be multiple packets on the wire. I am writing a kernel module that tries dropping packets. Is there a way to drop just one 1500B sized TCP packet inside an skb that can be a 64KB TCP segment? This segment has about 40 packets due to LSO, and I don't want to drop all of them; just a few (2--3) packets. Can the effect be achieved if I modify skb_shinfo(skb)->gso_segs, or nr_frags? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: task_struct's real_parent vs parent members
Hi Mulyadi, You were right. I found this code, which is called from ptrace_attach function: http://lxr.free-electrons.com/source/kernel/ptrace.c#L41 Thanks :) On 27 February 2012 08:45, Vimal wrote: > Hi Mulyadi, > > On 26 February 2012 23:48, Mulyadi Santosa wrote: >> I am bit rusty here, but AFAIK sigchld is thrown to the process who >> ptrace another process. CMIIW >> > > sigchld's definition [1] says it's sent to the parent process when a > child terminates. > > But I do agree that the notion of a parent seems a bit ambiguous when > a process is being ptraced. > > ptrace is what I found when looking through some websites, and you may > very well be correct. But as always, it is good to confirm through > code. :) Let me try checking the ptrace functions. > > Thanks! > -- > Vimal -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: task_struct's real_parent vs parent members
Hi Mulyadi, On 26 February 2012 23:48, Mulyadi Santosa wrote: > I am bit rusty here, but AFAIK sigchld is thrown to the process who > ptrace another process. CMIIW > sigchld's definition [1] says it's sent to the parent process when a child terminates. But I do agree that the notion of a parent seems a bit ambiguous when a process is being ptraced. ptrace is what I found when looking through some websites, and you may very well be correct. But as always, it is good to confirm through code. :) Let me try checking the ptrace functions. Thanks! -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
task_struct's real_parent vs parent members
Hi, I am looking through kernel 3.2's task_struct, which has two pointers for parents: real_parent and parent (http://lxr.free-electrons.com/source/include/linux/sched.h#L1313) I would like to know why there are two pointers, and how these two differ. init_task's parent and real_parent are initialised to point to itself. Since the main way processes are created is through fork()/clone(), I tried following do_fork function to see where a newly created process's parent is set, but I am unable to find it. I see that do_fork dups the current task_struct, and only updates the new process's real_parent, depending on flags (passed via clone/fork syscall). The documentation in sched.h:1313 says that "parent" refers to the parent task that would receive SIGCHLD (i.e., the one that issues wait4()). I followed the wait4() syscall to do_wait(), I still am not able to find where the task's parent is updated. Is there something I'm missing? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
linux hrtimer affinity
Hi all, Is there a way to set the affinity of hrtimer callback, so that it executes on a particular logical CPU? The reason is that I have a hrtimer callback that executes a tasklet. If the timer callback executes on a different CPU than the one it was enqueued in, then the tasklet is scheduled on the same CPU, which makes it difficult to reason serialising locks to per-CPU data structures. Any ideas? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: percpu variables from softirq context
Hi Mulyadi On 7 December 2011 10:48, Mulyadi Santosa wrote: > On Wed, Dec 7, 2011 at 06:08, Vimal wrote: >> Hi, >> >> I am trying to allocate a per-cpu variable from a softirq context, but >> the documentation for "alloc_percpu" says that the variable is >> allocated in GFP_KERNEL context, which can sleep. >> >> Is there a way around this? > > perhaps, by design, you should first re-think, could you do that > outside of softirq context? perhaps by deferring it into workqueue? Yes, that's definitely a possibility; I was just wondering if there was a reason behind not allowing it, or if I had missed something... Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
percpu variables from softirq context
Hi, I am trying to allocate a per-cpu variable from a softirq context, but the documentation for "alloc_percpu" says that the variable is allocated in GFP_KERNEL context, which can sleep. Is there a way around this? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
PCI device disappeared
A (pointer dereference) bug in my kernel module crashed the system, and when I rebooted, a network PCI device went missing. Several reboots didn't bring back the device, but a cold reboot did! I am curious: what could have caused this issue? -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
What is RTNL lock?
Hi all, In the Linux networking code, I see a lot of comments that say "Must be called with RTNL lock." What is this lock? I tried searching for it but couldn't find any explanation on what it is... Thanks -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
TSO support for veth
Hi all, It seems like OpenVZ has had TCP Segmentation Support for veth for quite a while[1], but the mainline kernel doesn't have it. I checked what it takes to add TSO support, and it looks like OpenVZ's veth.c (call it vzeth) has the following: 1. When initialising a virtual net_device "dev", vzeth declares dev->hw_features = ... (other features) | NETIF_F_TSO. 2. The ethtool_ops structure contains two additional handlers: get_tso and set_tso. 3. get_tso points to ethtool_op_get_tso 4. set_tso points to a special function, that invokes ethtool_op_set_tso on both the ends of the vzeth. I took a copy of the mainline veth.c, repeated the above steps and introduced a printk to make sure that the set_tso function gets invoked. Here's the diff: http://pastie.org/2924399 (space indent, sorry) But I see the following: 1. The modified veth driver loads. (good) 2. When I do: ethtool -K tso on, I don't see an error message. (good) 3. When I query: ethtool -k , I don't see TSO set on (bad) 4. When I check dmesg, I don't see that the "set_tso" function has not been invoked. (bad) Am I doing something wrong? [1] http://wiki.openvz.org/TSO -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Ownership of sk_buff
Hi all, When a protocol like TCP or UDP creates an sk_buff and passes it down to the layer 3 and layer 2 protocol functions, which module has ownership of the buffer as it gets passed down? Is it the responsibility of the caller, or the callee to free the sk_buff? Are there any exceptions? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Obtaining a list of open sockets from "struct task_struct"
Hi, On 11 November 2011 09:42, Nuno Martins wrote: > If you are sure that that file descriptor is a socket then you can > cast to struct socket, the field "void * private_data" in struct file > [1] . Thanks! That helped. > > But not all file descriptors are sockets so you have a way to be sure > that you are dealing with a socket, you have a macro > > #define S_ISSOCK(m) (((m) & S_IFMT) == S_IFSOCK) > > that macro is in [2] . That's an explicit way. But can instead of looking up this from the inode information, I think you can also deduce whether the file pointer is a socket by checking if "file.f_op == &socket_file_ops". :) > > Glad to help, i have searched all that because i had a project that i > needed to know which file descriptors were sockets, so i had to search > this information, i hope it's now a little be clear to you. > Yes, it's much clear now. Thanks a lot! -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Obtaining a list of open sockets from "struct task_struct"
Hi all, I am trying to understand more about kernel data structures and I would like to know how to obtain a list of TCP/UDP sockets, starting from a "struct task_struct" variable "task". So far, I have understood the following: Please correct me if I am wrong! :) - An open file descriptor is represented by a "struct file *" in the tasks's file table: task->files.fdt.fd - The file tables are organised as a linked list - The file table contains a structure fd_array, that is an array of "struct file *", each representing an open file But, almost all operations in the TCP code start from a "struct sock". Are the "struct file" and "struct sock" somehow connected? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: CPU usage accounting
Hi Mulyadi :) On 14 September 2011 12:44, Mulyadi Santosa wrote: >> >From what I understand: In case (1), the kernel code executes in the >> context of the application, so the CPU cycles are accounted directly >> to process that called write() (or send/sendto). > > excellent thinking, however AFAIK sometimes (or most of the times > now?) data sending is done in asynchronous style. so the counting > might be not so accurate since we don't really know how much the data > that are transmitted... Ah I completely forgot asynchronous case. Thanks for pointing it out! > >>But in case (2), > > excellent thinking. I place my bet on ksoftirqd > Thanks. I think it makes sense. Let me think of a way to actually confirm this. If you know of a way, do chip in :-) thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
CPU usage accounting
Hi, In the following cases, how does the CPU work done by the kernel on behalf of the application get accounted for? 1. When an application writes to a TCP/UDP socket, the networking stack does transmit side processing. 2. When an application receives data on a TCP/UDP socket, the networking stack does receive side processing. >From what I understand: In case (1), the kernel code executes in the context of the application, so the CPU cycles are accounted directly to process that called write() (or send/sendto). But in case (2), unless some processing is done, it is impossible to know which application is going to receive this packet. Whom do these cycles get charged to? Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Snooping on sockets/file descriptors
Hi Javier, > > If you want to do it in the kernel, you can write a loadable kernel > module to register netfilter hooks and obtain the socket buffers > (sk_buff). Thanks. If you see my earlier posts, I didn't want netfilter/pcap because they give me access to packets. I would like access to the stream of data that is read by the application using read()/recvmsg()/etc syscalls. @all: thanks for the help; I think I've figured out how to do it. I manually traced the system call to see which one would be called ultimately, for read on a socket. It turns out that skb_copy_datagram_iovec(..) is called ultimately (fn defn: http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1668). I could hook onto this function using kprobes and get the data that is read. Thanks! -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Snooping on sockets/file descriptors
> > then, something like dtrace or systemtap? IMO you're looking for kinda > combo of kernel mode + user land "sniffer"... the user land sniffer, > in it's very simple form, is by using LD_PRELOAD ... > dtrace seems fine and is similar to ptrace. But then, one would have to enumerate all possible syscalls that the application can issue to read data. For e.g., it could use read(), recvfrom(), recvmsg(), or even syscall(syscall#, args...) I wonder if LD_PRELOAD can be done on a program without shutting it down. ptrace fits the bill here, except for the above problem. Thanks! -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Snooping on sockets/file descriptors
Hi Daniel, > > How about tcpdump? > Thanks for the suggestion. tcpdump is good, but it doesn't solve all problems. There are a few reasons: * TCP packets could arrive out of order * The data needn't belong to a valid TCP connection * The app could just discard data (close/flush/etc) In short, there is a lot of state and complex logic which act on the packets before it is seen by the application. Given the complexity (such as wide variations in TCP implementation), I am not sure if reimplementing them is a good idea, even if it's possible. Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Snooping on sockets/file descriptors
Hi, Is it possible for an application (say "snoop", with sufficient privileges) to monitor data on any socket/file descriptor in the system? Here's an example: suppose we have a browser and it creates a tcp socket to connect to a URL. Whenever the browser issues a read() and data is pushed to user space, I want "snoop" to get notified and made available a copy of the same data that the browser read. ptrace can be used to do it, but then there are several ways the app can read data. It could use read(), or recv() or recvmsg(). Is there a better way to deal with this complexity? It's like the action of "tee" on any socket/file descriptor in the system. -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: User space context switch
Hi Mohit, On 18 March 2011 13:07, mohit verma wrote: > hi all, > is it possible to write a user space code to calculate the context switch > time of a process ? I mean , how can the user space code know about the > working of the scheduler? > Is there any system call API to interact with scheduler or something other > to interact with , regarding this problem? Do check the function tracing framework in the Linux Kernel. Links: * http://lwn.net/Articles/322666/ * http://lwn.net/Articles/290277/ sched_switch is the tracer you're looking for. -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Create a one-to-many tunnel
Hi Matthias, On 15 March 2011 06:26, Matthias Brugger wrote: > > your question isn't clear to me. anyway, have a look on the tun/tap kernel > module, so you would be able to do the impementation in userspace (might be > good for a first prototype or even good enough...). > Sorry if I wasn't clear. I wanted to create an IP in IP tunnel interface. As far as I know, tunnel creation support is available for point to point tunnels; i.e., we setup an IP in IP tunnel from host A (fixed IP)---host B (fixed IP) and every packet that is transmitted via the tunneled iface at A is encapsulated with B's IP address and routed from A (to B). I wanted an iface that does encapsulation irrespective of what the destination IP is. i.e., any packet that is transmitted from the tunnel iface on host A is encapsulated within another IP header. The reason I need this is that A can have multiple interfaces to reach B. Depending on which iface the packet was sent on, I need the source IP to be changed (which is basically SNAT). But for some reason, I require protocol stack at A to bind to one virtual IP address and B to know what the virtual IP address is. Yes, tun/tap should be easy to implement in user space. I was initially looking at off-the-shelf components. I looked at the source code of IP in IP tunnel and modified it accordingly to create a kernel module as per my requirements. Thanks! -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Create a one-to-many tunnel
Hi, I wish to create an IP in IP tunnel interface that works as follows: At the sender: * The interface checks the dst IP address on the IP packet that it receives. * It encapsulates the IP packet inside another IP packet with: Field 1: source address = the interface's IP address Field 2: dst address = the packet's dst IP address * The interface now transmits the packet This is different from normal IPinIP tunnels because it is not point-to-point. The operation in Field 2 is similar to NAT. Also, this is different from SNAT, because I want the receiver to know what the original source IP. At the receiver, assume that there is a stack that understands this special IPinIP packet and has a way to handle it. If it's too specific, then I do not mind implementing it. Thanks, -- Vimal ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies