Dropping one packet in a coalesced skb

2012-05-02 Thread Vimal
Hi,

I have LSO enabled on my machine, and hence a single skb can
(possibly) be multiple packets on the wire.  I am writing a kernel
module that tries dropping packets.  Is there a way to drop just one
1500B sized TCP packet inside an skb that can be a 64KB TCP segment?
This segment has about 40 packets due to LSO, and I don't want to drop
all of them; just a few (2--3) packets.

Can the effect be achieved if I modify skb_shinfo(skb)->gso_segs, or nr_frags?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: task_struct's real_parent vs parent members

2012-02-27 Thread Vimal
Hi Mulyadi,

You were right.   I found this code, which is called from
ptrace_attach function:

http://lxr.free-electrons.com/source/kernel/ptrace.c#L41

Thanks :)

On 27 February 2012 08:45, Vimal  wrote:
> Hi Mulyadi,
>
> On 26 February 2012 23:48, Mulyadi Santosa  wrote:
>> I am bit rusty here, but AFAIK sigchld is thrown to the process who
>> ptrace another process. CMIIW
>>
>
> sigchld's definition [1] says it's sent to the parent process when a
> child terminates.
>
> But I do agree that the notion of a parent seems a bit ambiguous when
> a process is being ptraced.
>
> ptrace is what I found when looking through some websites, and you may
> very well be correct.   But as always, it is good to confirm through
> code. :)   Let me try checking the ptrace functions.
>
> Thanks!
> --
> Vimal



-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: task_struct's real_parent vs parent members

2012-02-27 Thread Vimal
Hi Mulyadi,

On 26 February 2012 23:48, Mulyadi Santosa  wrote:
> I am bit rusty here, but AFAIK sigchld is thrown to the process who
> ptrace another process. CMIIW
>

sigchld's definition [1] says it's sent to the parent process when a
child terminates.

But I do agree that the notion of a parent seems a bit ambiguous when
a process is being ptraced.

ptrace is what I found when looking through some websites, and you may
very well be correct.   But as always, it is good to confirm through
code. :)   Let me try checking the ptrace functions.

Thanks!
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


task_struct's real_parent vs parent members

2012-02-26 Thread Vimal
Hi,

I am looking through kernel 3.2's task_struct, which has two pointers
for parents:  real_parent and parent
(http://lxr.free-electrons.com/source/include/linux/sched.h#L1313)   I
would like to know why there are two pointers, and how these two
differ.

init_task's parent and real_parent are initialised to point to itself.
 Since the main way processes are created is through fork()/clone(), I
tried following do_fork function to see where a newly created
process's parent is set, but I am unable to find it.  I see that
do_fork dups the current task_struct, and only updates the new
process's real_parent, depending on flags (passed via clone/fork
syscall).

The documentation in sched.h:1313 says that "parent" refers to the
parent task that would receive SIGCHLD (i.e., the one that issues
wait4()).  I followed the wait4() syscall to do_wait(), I still am not
able to find where the task's parent is updated.

Is there something I'm missing?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


linux hrtimer affinity

2012-02-14 Thread Vimal
Hi all,

Is there a way to set the affinity of hrtimer callback, so that it
executes on a particular logical CPU?

The reason is that I have a hrtimer callback that executes a tasklet.
 If the timer callback executes on a different CPU than the one it was
enqueued in, then the tasklet is scheduled on the same CPU, which
makes it difficult to reason serialising locks to per-CPU data
structures.

Any ideas?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: percpu variables from softirq context

2011-12-08 Thread Vimal
Hi Mulyadi

On 7 December 2011 10:48, Mulyadi Santosa  wrote:
> On Wed, Dec 7, 2011 at 06:08, Vimal  wrote:
>> Hi,
>>
>> I am trying to allocate a per-cpu variable from a softirq context, but
>> the documentation for "alloc_percpu" says that the variable is
>> allocated in GFP_KERNEL context, which can sleep.
>>
>> Is there a way around this?
>
> perhaps, by design, you should first re-think, could you do that
> outside of softirq context? perhaps by deferring it into workqueue?

Yes, that's definitely a possibility; I was just wondering if there
was a reason behind not allowing it, or if I had missed something...

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


percpu variables from softirq context

2011-12-06 Thread Vimal
Hi,

I am trying to allocate a per-cpu variable from a softirq context, but
the documentation for "alloc_percpu" says that the variable is
allocated in GFP_KERNEL context, which can sleep.

Is there a way around this?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


PCI device disappeared

2011-12-04 Thread Vimal
A (pointer dereference) bug in my kernel module crashed the system,
and when I rebooted, a network PCI device went missing.   Several
reboots didn't bring back the device, but a cold reboot did!   I am
curious: what could have caused this issue?

-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


What is RTNL lock?

2011-11-27 Thread Vimal
Hi all,

In the Linux networking code, I see a lot of comments that say "Must
be called with RTNL lock."

What is this lock?  I tried searching for it but couldn't find any
explanation on what it is...

Thanks
--
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


TSO support for veth

2011-11-26 Thread Vimal
Hi all,

It seems like OpenVZ has had TCP Segmentation Support for veth for
quite a while[1], but the mainline kernel doesn't have it.   I checked
what it takes to add TSO support, and it looks like OpenVZ's veth.c
(call it vzeth) has the following:

1. When initialising a virtual net_device "dev", vzeth declares
dev->hw_features = ... (other features) | NETIF_F_TSO.
2. The ethtool_ops structure contains two additional handlers: get_tso
and set_tso.
3. get_tso points to ethtool_op_get_tso
4. set_tso points to a special function, that invokes
ethtool_op_set_tso on both the ends of the vzeth.

I took a copy of the mainline veth.c, repeated the above steps and
introduced a printk to make sure that the set_tso function gets
invoked.

Here's the diff:  http://pastie.org/2924399 (space indent, sorry)

But I see the following:
1. The modified veth driver loads. (good)
2. When I do: ethtool -K  tso on, I don't see an error message. (good)
3. When I query: ethtool -k , I don't see TSO set on (bad)
4. When I check dmesg, I don't see that the "set_tso" function has not
been invoked. (bad)

Am I doing something wrong?

[1] http://wiki.openvz.org/TSO

-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Ownership of sk_buff

2011-11-23 Thread Vimal
Hi all,

When a protocol like TCP or UDP creates an sk_buff and passes it down
to the layer 3 and layer 2 protocol functions, which module has
ownership of the buffer as it gets passed down?   Is it the
responsibility of the caller, or the callee to free the sk_buff?  Are
there any exceptions?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Obtaining a list of open sockets from "struct task_struct"

2011-11-12 Thread Vimal
Hi,

On 11 November 2011 09:42, Nuno Martins  wrote:
> If you are sure that that file descriptor is a socket then you can
> cast to struct socket, the field "void * private_data"  in struct file
> [1] .

Thanks!   That helped.

>
> But not all file descriptors are sockets so you have a way to be sure
> that you are dealing with a socket, you have a macro
>
> #define S_ISSOCK(m)     (((m) & S_IFMT) == S_IFSOCK)
>
> that macro is in [2] .

That's an explicit way.   But can instead of looking up this from the
inode information, I think you can also deduce whether the file
pointer is a socket by checking if "file.f_op == &socket_file_ops".
:)


>
> Glad to help, i have searched all that because i had a project that i
> needed to know which file descriptors were sockets, so i had to search
> this information, i hope it's now a little be clear to you.
>


Yes, it's much clear now.  Thanks a lot!

-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Obtaining a list of open sockets from "struct task_struct"

2011-11-11 Thread Vimal
Hi all,

I am trying to understand more about kernel data structures and I
would like to know how to obtain a list of TCP/UDP sockets, starting
from a "struct task_struct" variable "task".

So far, I have understood the following:  Please correct me if I am wrong! :)

- An open file descriptor is represented by a "struct file *" in the
tasks's file table: task->files.fdt.fd
- The file tables are organised as a linked list
- The file table contains a structure fd_array, that is an array of
"struct file *", each representing an open file

But, almost all operations in the TCP code start from a "struct sock".
  Are the "struct file" and "struct sock" somehow connected?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: CPU usage accounting

2011-09-15 Thread Vimal
Hi Mulyadi :)

On 14 September 2011 12:44, Mulyadi Santosa  wrote:
>> >From what I understand: In case (1), the kernel code executes in the
>> context of the application, so the CPU cycles are accounted directly
>> to process that called write() (or send/sendto).
>
> excellent thinking, however AFAIK sometimes (or most of the times
> now?) data sending is done in asynchronous style. so the counting
> might be not so accurate since we don't really know how much the data
> that are transmitted...

Ah I completely forgot asynchronous case.  Thanks for pointing it out!

>
>>But in case (2),
>
> excellent thinking.  I place my bet on ksoftirqd
>

Thanks.  I think it makes sense.   Let me think of a way to actually
confirm this.  If you know of a way, do chip in :-)

thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


CPU usage accounting

2011-09-14 Thread Vimal
Hi,

In the following cases, how does the CPU work done by the kernel on
behalf of the application get accounted for?

1. When an application writes to a TCP/UDP socket, the networking
stack does transmit side processing.
2. When an application receives data on a TCP/UDP socket, the
networking stack does receive side processing.

>From what I understand: In case (1), the kernel code executes in the
context of the application, so the CPU cycles are accounted directly
to process that called write() (or send/sendto).  But in case (2),
unless some processing is done, it is impossible to know which
application is going to receive this packet.  Whom do these cycles get
charged to?

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Snooping on sockets/file descriptors

2011-04-01 Thread Vimal
Hi Javier,

>
> If you want to do it in the kernel, you can write a loadable kernel
> module to register netfilter hooks and obtain the socket buffers
> (sk_buff).


Thanks.

If you see my earlier posts, I didn't want netfilter/pcap because they
give me access to packets.  I would like access to the stream of data
that is read by the application using read()/recvmsg()/etc syscalls.

@all: thanks for the help; I think I've figured out how to do it.  I
manually traced the system call to see which one would be called
ultimately, for read on a socket.

It turns out that skb_copy_datagram_iovec(..) is called ultimately (fn
defn: http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1668).

I could hook onto this function using kprobes and get the data that is read.

Thanks!
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Snooping on sockets/file descriptors

2011-04-01 Thread Vimal
>
> then, something like dtrace or systemtap? IMO you're looking for kinda
> combo of kernel mode + user land "sniffer"... the user land sniffer,
> in it's very simple form, is by using LD_PRELOAD ...
>

dtrace seems fine and is similar to ptrace.  But then, one would have
to enumerate all possible syscalls that the application can issue to
read data.  For e.g., it could use read(), recvfrom(), recvmsg(), or
even syscall(syscall#, args...)

I wonder if LD_PRELOAD can be done on a program without shutting it
down.  ptrace fits the bill here, except for the above problem.

Thanks!

-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Snooping on sockets/file descriptors

2011-03-31 Thread Vimal
Hi Daniel,

>
> How about tcpdump?
>

Thanks for the suggestion.

tcpdump is good, but it doesn't solve all problems.  There are a few reasons:

* TCP packets could arrive out of order
* The data needn't belong to a valid TCP connection
* The app could just discard data (close/flush/etc)

In short, there is a lot of state and complex logic which act on the
packets before it is seen by the application.

Given the complexity (such as wide variations in TCP implementation),
I am not sure if reimplementing them is a good idea, even if it's
possible.

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Snooping on sockets/file descriptors

2011-03-31 Thread Vimal
Hi,

Is it possible for an application (say "snoop", with sufficient
privileges) to monitor data on any socket/file descriptor in the
system?

Here's an example:  suppose we have a browser and it creates a tcp
socket to connect to a URL.  Whenever the browser issues a read() and
data is pushed to user space, I want "snoop" to get notified and made
available a copy of the same data that the browser read.

ptrace can be used to do it, but then there are several ways the app
can read data.  It could use read(), or recv() or recvmsg().  Is there
a better way to deal with this complexity?

It's like the action of "tee" on any socket/file descriptor in the system.

-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: User space context switch

2011-03-21 Thread Vimal
Hi Mohit,

On 18 March 2011 13:07, mohit verma  wrote:
> hi all,
>   is it possible to write a user space code to calculate the context switch
> time of a process ? I mean , how can the user space code know about the
> working of the scheduler?
> Is there any system call API to interact with scheduler or something other
> to interact with , regarding  this problem?

Do check the function tracing framework in the Linux Kernel.

Links:
* http://lwn.net/Articles/322666/
* http://lwn.net/Articles/290277/

sched_switch is the tracer you're looking for.

-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Create a one-to-many tunnel

2011-03-15 Thread Vimal
Hi Matthias,

On 15 March 2011 06:26, Matthias Brugger  wrote:
>
> your question isn't clear to me. anyway, have a look on the tun/tap kernel
> module, so you would be able to do the impementation in userspace (might be
> good for a first prototype or even good enough...).
>

Sorry if I wasn't clear.

I wanted to create an IP in IP tunnel interface.  As far as I know,
tunnel creation support is available for point to point tunnels; i.e.,
we setup an IP in IP tunnel from host A (fixed IP)---host B (fixed IP)
and every packet that is transmitted via the tunneled iface at A is
encapsulated with B's IP address and routed from A (to B).

I wanted an iface that does encapsulation irrespective of what the
destination IP is.

i.e., any packet that is transmitted from the tunnel iface on host A
is encapsulated within another IP header.

The reason I need this is that A can have multiple interfaces to reach
B.  Depending on which iface the packet was sent on, I need the source
IP to be changed (which is basically SNAT).  But for some reason, I
require protocol stack at A to bind to one virtual IP address and B to
know what the virtual IP address is.

Yes, tun/tap should be easy to implement in user space.  I was
initially looking at off-the-shelf components.

I looked at the source code of IP in IP tunnel and modified it
accordingly to create a kernel module as per my requirements.

Thanks!
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Create a one-to-many tunnel

2011-03-11 Thread Vimal
Hi,

I wish to create an IP in IP tunnel interface that works as follows:

At the sender:
* The interface checks the dst IP address on the IP packet that it receives.
* It encapsulates the IP packet inside another IP packet with:
   Field 1: source address = the interface's IP address
   Field 2: dst address = the packet's dst IP address
* The interface now transmits the packet

This is different from normal IPinIP tunnels because it is not
point-to-point.  The operation in Field 2 is similar to NAT.  Also,
this is different from SNAT, because I want the receiver to know what
the original source IP.

At the receiver, assume that there is a stack that understands this
special IPinIP packet and has a way to handle it.

If it's too specific, then I do not mind implementing it.

Thanks,
-- 
Vimal

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies