From: Evgeniy Polyakov <[EMAIL PROTECTED]>
Date: Sat, 18 Feb 2006 19:11:40 +0300

> Hello developers.
> 
> I'm pleased to announce combined patch of kevent and network AIO subsytems,
> which are implemented using three system calls:
>  * kevent_ctl(), which fully controls kevent subsystem.
>    It allows to add/modify/remove kevents and waiting for either given
>    number or at least one kevent is ready.
>  * aio_send().
>    This system call allows to asynchronously transfer userspace buffer
>    to the remote host using zero-copy mechanism.
>  * aio_recv().
>    Asynchronous receiving support.
> 
> Next step is to implement aio_sendfile() support.
> 
> Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

I did minor review today, the scheme is interesting, but here
are some initial code review comments:

1) Please format to 80 columns.  I use a very generous 86 character
   wide emacs window and much of your code wrapped lines :-)

2) I do not see real need to enumeration for all of the KEVENT_*
   constants, just use CPP defines.  You are setting them explicitly
   anyways.

3) Adding sk_kevent_flags to struct sock looks like real bloat, you
   just have 1 or 2 flags, just allocate them to sk_flags.

   I also do not see why you need to obtain the socket lock around
   set/clear bit.  You are using atomic bit operations and you are
   not testing anything other than local variable state ('valbool').

4) I have 2 major problems with tcp_async_{send,recv}().

   a) Much duplication of existing tcp sendmsg/recvmsg code.
      Consolidation and code sharing is needed.

   b) A large chunk of it is protocol agnostic, we can share this
      logic such that support for protocols such as DCCP can be
      made very simply.

5) Why is the ordering between sk_stream_set_owner_r() and
   __skb_queue_tail() important in this hunk?

@@ -3079,8 +3080,8 @@ queue_and_out:
                                    !sk_stream_rmem_schedule(sk, skb))
                                        goto drop;
                        }
-                       sk_stream_set_owner_r(skb, sk);
                        __skb_queue_tail(&sk->sk_receive_queue, skb);
+                       sk_stream_set_owner_r(skb, sk);
                }
                tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
                if(skb->len)

6) I disagree with turning off the prequeue just because the socket is
   in async mode:

@@ -3819,7 +3820,7 @@ int tcp_rcv_established(struct sock *sk,
                        if (tp->ucopy.task == current &&
                            tp->copied_seq == tp->rcv_nxt &&
                            len - tcp_header_len <= tp->ucopy.len &&
-                           sock_owned_by_user(sk)) {
+                           sock_owned_by_user(sk) && !sock_async(sk)) {

   If the user blocked on a socket operation and we can do prequeue
   processing, directly copying to userspace, we should.

   Or are we zero-copy'ing here?

7) More prequeue bypassing when socket is in async mode:

+       if (sock_async(sk)) {
+               spin_lock_bh(&sk->sk_lock.slock);
+               ret = tcp_v4_do_rcv(sk, skb);
+               spin_unlock_bh(&sk->sk_lock.slock);
+       } else {
+               bh_lock_sock(sk);
+               if (!sock_owned_by_user(sk)) {
+                       if (!tcp_prequeue(sk, skb))
+                               ret = tcp_v4_do_rcv(sk, skb);
+               } else
+                       sk_add_backlog(sk, skb);
+               bh_unlock_sock(sk);
+       }

I really think #6 and #7 need justification, because if we do
end up with net channels, they will go via the TCP prequeue
mechanism.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to