Re: iscsi target regression due to "tcp: remove prequeue support" patch

2018-01-18 Thread Mike Christie
On 01/18/2018 09:10 AM, Florian Westphal wrote:
>> It would indicate users providing their own ->sk_data_ready() callback
>> must be responsible for waking up a kthread context blocked on
>> sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
>> received before the first sock_recvmsg(..., MSG_WAITALL) completes.
> 
> I agree, it looks like we need something like this?
> (not even build tested):
> 
> diff --git a/drivers/target/iscsi/iscsi_target_nego.c 
> b/drivers/target/iscsi/iscsi_target_nego.c
> index b686e2ce9c0e..3723f8f419aa 100644
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
> if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, >login_flags)) {
> write_unlock_bh(>sk_callback_lock);
> pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p \n", 
> conn);
> +   if (WARN_ON(iscsi_target_sk_data_ready == 
> conn->orig_data_ready))
> +   return;
> +   conn->orig_data_ready(sk);
> return;

This allows iscsi login to work for me.

I ran it against the target-pending for-next branch.


Re: iscsi target regression due to "tcp: remove prequeue support" patch

2018-01-15 Thread Mike Christie
On 01/15/2018 12:41 AM, Nicholas A. Bellinger wrote:
> Hey MNC & Co,
> 
> Ping on the earlier iscsi-target authentication login failures atop
> 4.14 + commit e7942d063 removing tcp prequeue support.
> 
> For reference, what is your pre 4.14 environment using for
> sysctl_tcp_low_latency..?

tcp_low_latency=1.

I have tested the current kernel with 1 and 0, and it does not make a
difference.

> 
> netdev folks, how would you like to proceed for -rc1..?
> 
> On Mon, 2018-01-08 at 22:32 -0800, Nicholas A. Bellinger wrote:
>> Hi MNC & Florian,
>>
>> (Adding net-dev + DaveM CC')
>>
>> Catching up on pre-holiday threads, thanks for the heads up.
>>
>> Comments below.
>>
>> On Wed, 2017-12-13 at 23:56 -0600, Mike Christie wrote:
>>> Hey Nick and Florian,
>>>
>>> Starting in 4.14 iscsi logins will fail around 50% of the time.
>>>
>>> I git bisected the issue down to this commit:
>>>
>>> commit e7942d0633c47c791ece6afa038be9cf977226de
>>> Author: Florian Westphal <f...@strlen.de>
>>> Date:   Sun Jul 30 03:57:18 2017 +0200
>>>
>>> tcp: remove prequeue support
>>>
>>> Nick, attached is the iscsi target log info when the login fails.
>>>
>>> You can see at:
>>>
>>> Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT:
>>> 0x, CmdSN: 0x, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
>>>
>>> we have got a login command and we seem to then go into
>>> iscsit_do_rx_data -> sock_recvmsg
>>>
>>> We seem to get stuck in there though, because we stay blocked until:
>>>
>>> Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready:
>>> conn: 88b35cbb3000
>>> Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn:
>>> 88b35cbb3000 >>>>
>>>
>>> where initiator side timeout fires 15 seconds later and it disconnects
>>> the tcp connection, and we eventually break out of the recvmsg call:
>>>
>>> Dec 13 17:55:16 rhel73n1 kernel: Entering iscsi_target_sk_state_change
>>> Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_check_close:
>>> TCP_CLOSE_WAIT|TCP_CLOSE,returning FALSE
>>>
>>> 
>>>
>>> Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
>>> Dec 13 17:55:16 rhel73n1 kernel: iscsi_target_do_login_rx after
>>> rx_login_io, 88b35cbb3000, kworker/2:2:1829
>>>
>>
>> Ok, the 3rd third login request payload (65 + 3 padded to 68 bytes)
>> containing CHAP_N + CHAP_R keys remains blocked on sock_recvmsg(), until
>> TPG login_timeout subsequently fires after 15 seconds of inactivity to
>> terminate this login attempt.
>>
>>> Is the iscsi target doing something incorrect in its use of
>>> sk_data_ready and sock_recvmsg or is the tcp patch at fault?
>>
>> From the logs, sk_data_ready() -> iscsi_target_sk_data_ready() callbacks
>> appear firing as expected.
>>
>> iscsi-target login does iscsit_rx_do_data() -> rx_data() ->
>> sock_recvmsg(..., MSG_WAITALL) from a system_wq kworker process context
>> after iscsi_target_sk_data_ready() callback queues up
>> iscsi_conn->login_work for execution, and sock_recvmsg() uses a single
>> struct kvec iovec for struct msg_hdr.
>>
>> AFAICT, iscsi-target uses blocking kernel socket reads from process
>> context, similar to kernel_recvmsg(..., MSG_WAITALL) with DRBD.
>>
>> Florian + DaveM, any idea why the removal of prequeue support is having
>> an effect here..?
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



Re: [PATCH net-next 1/4] scsi_tcp: block BH in TCP callbacks

2016-05-18 Thread Mike Christie
On 05/17/2016 07:44 PM, Eric Dumazet wrote:
> iscsi_sw_tcp_data_ready() and iscsi_sw_tcp_state_change() were
> using read_lock(>sk_callback_lock) which is fine if caller
> disabled BH.
> 
> TCP stack no longer has this requirement and can run from
> process context.
> 
> Use read_lock_bh() variant to restore previous assumption.
> 
> Ideally this code could use RCU instead...
> 
> Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
> Fixes: d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Mike Christie <micha...@cs.wisc.edu>
> Cc: Venkatesh Srinivas <venkate...@google.com>
> ---
>  drivers/scsi/iscsi_tcp.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 2e4c82f8329c..ace4f1f41b8e 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -131,10 +131,10 @@ static void iscsi_sw_tcp_data_ready(struct sock *sk)
>   struct iscsi_tcp_conn *tcp_conn;
>   read_descriptor_t rd_desc;
>  
> - read_lock(>sk_callback_lock);
> + read_lock_bh(>sk_callback_lock);
>   conn = sk->sk_user_data;
>   if (!conn) {
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>   return;
>   }
>   tcp_conn = conn->dd_data;
> @@ -154,7 +154,7 @@ static void iscsi_sw_tcp_data_ready(struct sock *sk)
>   /* If we had to (atomically) map a highmem page,
>* unmap it now. */
>   iscsi_tcp_segment_unmap(_conn->in.segment);
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>  }
>  
>  static void iscsi_sw_tcp_state_change(struct sock *sk)
> @@ -165,10 +165,10 @@ static void iscsi_sw_tcp_state_change(struct sock *sk)
>   struct iscsi_session *session;
>   void (*old_state_change)(struct sock *);
>  
> - read_lock(>sk_callback_lock);
> + read_lock_bh(>sk_callback_lock);
>   conn = sk->sk_user_data;
>   if (!conn) {
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>   return;
>   }
>   session = conn->session;
> @@ -179,7 +179,7 @@ static void iscsi_sw_tcp_state_change(struct sock *sk)
>   tcp_sw_conn = tcp_conn->dd_data;
>   old_state_change = tcp_sw_conn->old_state_change;
>  
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>  
>   old_state_change(sk);
>  }
> 

Reviewed and tested. Thanks

Acked-by: Mike Christie <micha...@cs.wisc.edu>


Re: [PATCH net-next 1/4] scsi_tcp: block BH in TCP callbacks

2016-05-18 Thread Mike Christie
On 05/17/2016 07:44 PM, Eric Dumazet wrote:
> iscsi_sw_tcp_data_ready() and iscsi_sw_tcp_state_change() were
> using read_lock(>sk_callback_lock) which is fine if caller
> disabled BH.
> 
> TCP stack no longer has this requirement and can run from
> process context.
> 
> Use read_lock_bh() variant to restore previous assumption.
> 
> Ideally this code could use RCU instead...
> 
> Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
> Fixes: d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Mike Christie <micha...@cs.wisc.edu>
> Cc: Venkatesh Srinivas <venkate...@google.com>
> ---
>  drivers/scsi/iscsi_tcp.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 2e4c82f8329c..ace4f1f41b8e 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -131,10 +131,10 @@ static void iscsi_sw_tcp_data_ready(struct sock *sk)
>   struct iscsi_tcp_conn *tcp_conn;
>   read_descriptor_t rd_desc;
>  
> - read_lock(>sk_callback_lock);
> + read_lock_bh(>sk_callback_lock);
>   conn = sk->sk_user_data;
>   if (!conn) {
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>   return;
>   }
>   tcp_conn = conn->dd_data;
> @@ -154,7 +154,7 @@ static void iscsi_sw_tcp_data_ready(struct sock *sk)
>   /* If we had to (atomically) map a highmem page,
>* unmap it now. */
>   iscsi_tcp_segment_unmap(_conn->in.segment);
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>  }
>  
>  static void iscsi_sw_tcp_state_change(struct sock *sk)
> @@ -165,10 +165,10 @@ static void iscsi_sw_tcp_state_change(struct sock *sk)
>   struct iscsi_session *session;
>   void (*old_state_change)(struct sock *);
>  
> - read_lock(>sk_callback_lock);
> + read_lock_bh(>sk_callback_lock);
>   conn = sk->sk_user_data;
>   if (!conn) {
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>   return;
>   }
>   session = conn->session;
> @@ -179,7 +179,7 @@ static void iscsi_sw_tcp_state_change(struct sock *sk)
>   tcp_sw_conn = tcp_conn->dd_data;
>   old_state_change = tcp_sw_conn->old_state_change;
>  
> - read_unlock(>sk_callback_lock);
> + read_unlock_bh(>sk_callback_lock);
>  
>   old_state_change(sk);
>  }

Can I just confirm that nested bh lock calls like:

spin_lock_bh(lock1);
spin_lock_bh(lock2);

do something

spin_unlock_bh(lock2);
spin_unlock_bh(lock1);

is ok? It seems smatch sometimes warns about this.

I found this thread

https://lkml.org/lkml/2011/1/25/232

which says it is ok.


Re: [dm-devel] [PATCH 22/26] iscsi_tcp: Use ahash

2016-01-25 Thread Mike Christie
On 01/24/2016 07:19 AM, Herbert Xu wrote:
> This patch replaces uses of the long obsolete hash interface with
> ahash.
> 
> Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>
> ---
> 
>  drivers/scsi/iscsi_tcp.c|   54 
> ++--
>  drivers/scsi/iscsi_tcp.h|4 +--
>  drivers/scsi/libiscsi_tcp.c |   29 +--
>  include/scsi/libiscsi_tcp.h |   13 +-
>  4 files changed, 58 insertions(+), 42 deletions(-)
> 

iSCSI parts look ok.

Reviewed-by: Mike Christie <micha...@cs.wisc.edu>



Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-11-28 Thread Mike Christie

Anil Veerabhadrappa wrote:


Which ones were they exactly? I think JamesB wanted only common 
transport values in the transport class. If it is driver specific then 
it should go on the host or target or device with the scsi_host_template 
attrs.



It's a chicken  egg issue to put port mapper sysfs entry in scsi host
attributes. Application won't see sysfs unless initiator creates an

Sorry for the late response. I was on vacation.

That is only with how you coded it today. I asked you to do something 
like qla4xxx where the session and host are not so closely bound.


bnx2i is still using scsi host per iscsi session model, we plan to
migrate to scsi host per device model pretty soon. Previously you
indicated that you are working on this port, can you please share the
code? We will build on your work and bring this to closure.


Send me a link to your code so I can rebuild my stuff against it.





iSCSI session and driver can't create an iSCSI session without a tcp
That is not right with how things are today even. The iscsi_session 
struct can be created before the tcp connection. This was done because 
we thought we were going to have to use only sysfs for all setup and 
management (we ended up netlink and sysfs though).


You are right, software is flexible enough to allow creation of
session/connection and endpoint independently but so far user daemon
creates TCP endpoint before iSCSI session and bind them all together.
Any changes to this scheme will not be compatible with existing
distributions



So what do you need to do exactly? You want userspace to be able to set 
some session or connection value, then have it used for that session, 
right? We have been using the netlink interface for that. You would 
basically pass iscsiadm some value (on cmdline or in some file/db), then 
when the session and connection is being created we will pass those 
values in. Values that are used after we are in FFP, like abort 
timeouts, are passed in the set_param command. If a value is needed for 
the session/connection setup then we were passing that in with the 
command (like create_session has the queue depth sent with it). To add 
new values (driver specific and common ones) and have the interface 
compatible with older versions should not be too difficult.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-11-26 Thread Mike Christie

Anil Veerabhadrappa wrote:
The sysfs bits related to the hba should be use one of the scsi sysfs 
facilities or if they are related to iscsi bits and are generic then 
through the iscsi hba

bnx2i needs 2 sysfs entries -
1. QP size info - this is used to size per connection shared data
structures to issue work requests to chip (login, scsi cmd, tmf, nopin)
and get completions from the chip (scsi completions, async messages,
etc'). This is a iSCSI HBA attribute
2. port mapper - we can be more flexible on classifying this as either
iSCSI HBA attribute or bnx2i driver global attribute
Can hooks be added to iSCSI transport class to include these?

Which ones were they exactly? I think JamesB wanted only common 
transport values in the transport class. If it is driver specific then 
it should go on the host or target or device with the scsi_host_template 
attrs.




It's a chicken  egg issue to put port mapper sysfs entry in scsi host
attributes. Application won't see sysfs unless initiator creates an


Sorry for the late response. I was on vacation.

That is only with how you coded it today. I asked you to do something 
like qla4xxx where the session and host are not so closely bound.



iSCSI session and driver can't create an iSCSI session without a tcp


That is not right with how things are today even. The iscsi_session 
struct can be created before the tcp connection. This was done because 
we thought we were going to have to use only sysfs for all setup and 
management (we ended up netlink and sysfs though).



port. I was wondering if there is a better way than using IOCTL in this
situation?

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-07 Thread Mike Christie

Anil Veerabhadrappa wrote:



+
+/* iSCSI stages */
+#define ISCSI_STAGE_SECURITY_NEGOTIATION (0)
+#define ISCSI_STAGE_LOGIN_OPERATIONAL_NEGOTIATION (1)
+#define ISCSI_STAGE_FULL_FEATURE_PHASE (3)
+/* Logout response codes */
+#define ISCSI_LOGOUT_RESPONSE_CONNECTION_CLOSED (0)
+#define ISCSI_LOGOUT_RESPONSE_CID_NOT_FOUND (1)
+#define ISCSI_LOGOUT_RESPONSE_CLEANUP_FAILED (3)
+
+/* iSCSI task types */
+#define ISCSI_TASK_TYPE_READ(0)
+#define ISCSI_TASK_TYPE_WRITE   (1)
+#define ISCSI_TASK_TYPE_MPATH   (2)




All of these iscsi code shoulds be in iscsi_proto.h or should be added 
there.

This is a very tricky proposal as this header file is automatically
generated by a well defined process and is shared between various driver
supporting multiple platform/OS and the firmware. If it is not of a big
issue I would like to keep it the way it is.


The values that are iscsi RFC values should come from the iscsi_proto.h 
file and not be duplicated for each driver.




+/*
+ * hardware reset
+ */
+int bnx2i_reset(struct scsi_cmnd *sc)
+{
+   return 0;
+}


So what is up with this one? It seems like if there is a way to reset 
hardware then you would want it as the scsi eh host reset callout 
instead of dropping the session. We could add some transport level 
recovery callouts for the iscsi specifics.


We may not be able to support HBA cold reset as bnx2 driver is the
primary owner of chip reset and initialization. This is the drawback of
sharing network interface with the NIC driver. If there is a need for
administrator to reset the iSCSI port same can be achieved by running
'ifdown eth#' and 'ifup eth#'.
Current driver even allows ethernet interface reset when there are
active iSCSI connection, all active iscsi sessions will be reinstated
when the network link comes back live
 



If you cannot support it or it does not make sense just remove the stub 
then. I say it is not a big deal now, but hopefully we do not hit fun 
like with qla3xxx and qla4xxx :)



+
+void bnx2i_sysfs_cleanup(void)
+{
+   class_device_unregister(port_class_dev);
+   class_unregister(bnx2i_class);
+}
The sysfs bits related to the hba should be use one of the scsi sysfs 
facilities or if they are related to iscsi bits and are generic then 
through the iscsi hba


bnx2i needs 2 sysfs entries -
1. QP size info - this is used to size per connection shared data
structures to issue work requests to chip (login, scsi cmd, tmf, nopin)
and get completions from the chip (scsi completions, async messages,
etc'). This is a iSCSI HBA attribute
2. port mapper - we can be more flexible on classifying this as either
iSCSI HBA attribute or bnx2i driver global attribute
Can hooks be added to iSCSI transport class to include these?



Which ones were they exactly? I think JamesB wanted only common 
transport values in the transport class. If it is driver specific then 
it should go on the host or target or device with the scsi_host_template 
attrs.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SG_IO with 4k buffer size to iscsi sg device causes Bad page panic

2007-05-08 Thread Mike Christie
Qi, Yanling wrote:
 Hi All,
 
 This panic is related to the interactions between scsi/sg.c, iscsi
 initiator and tcp on the RHEL 2.6.9-42 kernel. But we may also have the
 similar problem with open-iscsi initiator. I will explain why we see the

Yeah, this problem should occur in the upstream open-iscsi iscsi code.
open-iscsi works very similar to linux-scsi where it just sends pages
around with sock-ops-sendpage, and it looks like sg uses
__get_free_pages in RHEL's kernel and upstream it uses alloc_pages so
unless there was a change in those functions or the network layer then
we should have a similar problem.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.

2006-09-14 Thread Mike Christie
Peter Zijlstra wrote:
 On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
 Peter Zijlstra wrote:
 Implement sht-swapdev() for iSCSI. This method takes care of reserving
 the extra memory needed and marking all relevant sockets with SOCK_VMIO.

 When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
 the TCP connect is done with SOCK_VMIO to ensure their success. Also the
 netlink userspace interface is marked SOCK_VMIO, this will ensure that even
 under pressure we can still communicate with the daemon (which runs as
 mlockall() and needs no additional memory to operate).

 Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
 present. This ensures that the netlink socket will not block. User-space 
 will
 need to retry failed requests.

 The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
 This makes sure we do not block the critical socket, and that we do not
 fail to process incomming data.

 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 CC: Mike Christie [EMAIL PROTECTED]
 ---
  drivers/scsi/iscsi_tcp.c|  103 
 +++-
  drivers/scsi/scsi_transport_iscsi.c |   23 +++-
  include/scsi/libiscsi.h |1 
  include/scsi/scsi_transport_iscsi.h |2 
  4 files changed, 113 insertions(+), 16 deletions(-)

 Index: linux-2.6/drivers/scsi/iscsi_tcp.c
 ===
 --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
 +++ linux-2.6/drivers/scsi/iscsi_tcp.c
 @@ -42,6 +42,7 @@
  #include scsi/scsi_host.h
  #include scsi/scsi.h
  #include scsi/scsi_transport_iscsi.h
 +#include scsi/scsi_device.h
  
  #include iscsi_tcp.h
  
 @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
 int rc;
 struct iscsi_conn *conn = rd_desc-arg.data;
 struct iscsi_tcp_conn *tcp_conn = conn-dd_data;
 -   int processed;
 +   int processed = 0;
 char pad[ISCSI_PAD_LEN];
 struct scatterlist sg;
 +   unsigned long pflags = current-flags;
 +
 +   if (sk_has_vmio(tcp_conn-sock-sk))
 +   current-flags |= PF_MEMALLOC;
  
 Is this too late or not needed or what is it for? This function gets run
 from the network layer's softirq and at this point we have a skbuff with
 data that we want to process. The iscsi layer also does not allocate
 memory for read or write IO in this path.
 
 I thought I found allocations in that path, lemme search...
 found this:
 
 iscsi_tcp_data_recv()
   iscsi_data_rescv()
 iscsi_complete_pdu()
   __iscsi_complete_pdu()
 iscsi_recv_pdu()
   alloc_skb( GFP_ATOMIC);
 

You are right that is for the netlink interface. Could we move the
PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
qla4xxx will have it set when they need it. I will send a patch for this
along with a way to have the netlink sock vmio set for all iscsi drivers
that need it.


 I think we would want to set this flag at a lower level. Something
 closer to where the skbuf is allocated?
 
 Is that the skbuff you were talking about? If so, I'd need to carve a
 path to pass the swapper information. I had that in a previous patch,
 but that was large and ugly. I had to go carrying gfp_t flags all
 through that call chain.
 

In my original post I was just concerned about the sk_buff that gets
passed to the iscsi layer in iscsi_tcp_data_recv. I was wondering if the
chunk of code in the network layer or network driver that allocated that
skbuff needed to set PF_MEMALLOC.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.

2006-09-14 Thread Mike Christie
Peter Zijlstra wrote:
 On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
 Peter Zijlstra wrote:
 On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
 Peter Zijlstra wrote:
 Implement sht-swapdev() for iSCSI. This method takes care of reserving
 the extra memory needed and marking all relevant sockets with SOCK_VMIO.

 When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
 the TCP connect is done with SOCK_VMIO to ensure their success. Also the
 netlink userspace interface is marked SOCK_VMIO, this will ensure that 
 even
 under pressure we can still communicate with the daemon (which runs as
 mlockall() and needs no additional memory to operate).

 Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
 present. This ensures that the netlink socket will not block. User-space 
 will
 need to retry failed requests.

 The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
 This makes sure we do not block the critical socket, and that we do not
 fail to process incomming data.

 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 CC: Mike Christie [EMAIL PROTECTED]
 ---
  drivers/scsi/iscsi_tcp.c|  103 
 +++-
  drivers/scsi/scsi_transport_iscsi.c |   23 +++-
  include/scsi/libiscsi.h |1 
  include/scsi/scsi_transport_iscsi.h |2 
  4 files changed, 113 insertions(+), 16 deletions(-)

 Index: linux-2.6/drivers/scsi/iscsi_tcp.c
 ===
 --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
 +++ linux-2.6/drivers/scsi/iscsi_tcp.c
 @@ -42,6 +42,7 @@
  #include scsi/scsi_host.h
  #include scsi/scsi.h
  #include scsi/scsi_transport_iscsi.h
 +#include scsi/scsi_device.h
  
  #include iscsi_tcp.h
  
 @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
   int rc;
   struct iscsi_conn *conn = rd_desc-arg.data;
   struct iscsi_tcp_conn *tcp_conn = conn-dd_data;
 - int processed;
 + int processed = 0;
   char pad[ISCSI_PAD_LEN];
   struct scatterlist sg;
 + unsigned long pflags = current-flags;
 +
 + if (sk_has_vmio(tcp_conn-sock-sk))
 + current-flags |= PF_MEMALLOC;
  
 Is this too late or not needed or what is it for? This function gets run
 from the network layer's softirq and at this point we have a skbuff with
 data that we want to process. The iscsi layer also does not allocate
 memory for read or write IO in this path.
 I thought I found allocations in that path, lemme search...
 found this:

 iscsi_tcp_data_recv()
   iscsi_data_rescv()
 iscsi_complete_pdu()
   __iscsi_complete_pdu()
 iscsi_recv_pdu()
   alloc_skb( GFP_ATOMIC);

 You are right that is for the netlink interface. Could we move the
 PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
 iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
 qla4xxx will have it set when they need it. I will send a patch for this
 along with a way to have the netlink sock vmio set for all iscsi drivers
 that need it.
 
 I already have such a patch, look at:
 http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
 

You are drowning me in patches :) I did not see that one. I was still
commenting on this patch :)

The new patch looks ok.


 but what conditional do you want to use for PF_MEMALLOC, an
 unconditional setting will be highly unpopular.
 
 Hmm, perhaps you could key it of sk_has_vmio(nls)...

Yes.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.

2006-09-14 Thread Mike Christie
Mike Christie wrote:
 Peter Zijlstra wrote:
 On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
 Peter Zijlstra wrote:
 On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
 Peter Zijlstra wrote:
 Implement sht-swapdev() for iSCSI. This method takes care of reserving
 the extra memory needed and marking all relevant sockets with SOCK_VMIO.

 When used for swapping, TCP socket creation is done under GFP_MEMALLOC 
 and
 the TCP connect is done with SOCK_VMIO to ensure their success. Also the
 netlink userspace interface is marked SOCK_VMIO, this will ensure that 
 even
 under pressure we can still communicate with the daemon (which runs as
 mlockall() and needs no additional memory to operate).

 Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper 
 is
 present. This ensures that the netlink socket will not block. User-space 
 will
 need to retry failed requests.

 The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
 This makes sure we do not block the critical socket, and that we do not
 fail to process incomming data.

 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 CC: Mike Christie [EMAIL PROTECTED]
 ---
  drivers/scsi/iscsi_tcp.c|  103 
 +++-
  drivers/scsi/scsi_transport_iscsi.c |   23 +++-
  include/scsi/libiscsi.h |1 
  include/scsi/scsi_transport_iscsi.h |2 
  4 files changed, 113 insertions(+), 16 deletions(-)

 Index: linux-2.6/drivers/scsi/iscsi_tcp.c
 ===
 --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
 +++ linux-2.6/drivers/scsi/iscsi_tcp.c
 @@ -42,6 +42,7 @@
  #include scsi/scsi_host.h
  #include scsi/scsi.h
  #include scsi/scsi_transport_iscsi.h
 +#include scsi/scsi_device.h
  
  #include iscsi_tcp.h
  
 @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
  int rc;
  struct iscsi_conn *conn = rd_desc-arg.data;
  struct iscsi_tcp_conn *tcp_conn = conn-dd_data;
 -int processed;
 +int processed = 0;
  char pad[ISCSI_PAD_LEN];
  struct scatterlist sg;
 +unsigned long pflags = current-flags;
 +
 +if (sk_has_vmio(tcp_conn-sock-sk))
 +current-flags |= PF_MEMALLOC;
  
 Is this too late or not needed or what is it for? This function gets run
 from the network layer's softirq and at this point we have a skbuff with
 data that we want to process. The iscsi layer also does not allocate
 memory for read or write IO in this path.
 I thought I found allocations in that path, lemme search...
 found this:

 iscsi_tcp_data_recv()
   iscsi_data_rescv()
 iscsi_complete_pdu()
   __iscsi_complete_pdu()
 iscsi_recv_pdu()
   alloc_skb( GFP_ATOMIC);

 You are right that is for the netlink interface. Could we move the
 PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
 iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
 qla4xxx will have it set when they need it. I will send a patch for this
 along with a way to have the netlink sock vmio set for all iscsi drivers
 that need it.
 I already have such a patch, look at:
 http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch

 
 You are drowning me in patches :) I did not see that one. I was still
 commenting on this patch :)
 
 The new patch looks ok.
 

Oh, I think you need a sock_put to go with netlink lookup (lookup does a
hold).
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.

2006-09-14 Thread Mike Christie
Peter Zijlstra wrote:
 On Thu, 2006-09-14 at 22:35 +0200, Peter Zijlstra wrote:
 On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
 
 I thought I found allocations in that path, lemme search...
 found this:

 iscsi_tcp_data_recv()
   iscsi_data_rescv()
 iscsi_complete_pdu()
   __iscsi_complete_pdu()
 iscsi_recv_pdu()
   alloc_skb( GFP_ATOMIC);

 You are right that is for the netlink interface. Could we move the
 PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
 iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
 qla4xxx will have it set when they need it. I will send a patch for this
 along with a way to have the netlink sock vmio set for all iscsi drivers
 that need it.
 I already have such a patch, look at:
 http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch

 but what conditional do you want to use for PF_MEMALLOC, an
 unconditional setting will be highly unpopular.

 Hmm, perhaps you could key it of sk_has_vmio(nls)...
 
 On second thought, not such a good idea, that will still be too course.
 You only want to force feed stuff originating from
 sk_has_vmio(iscsi_tcp_conn-sock-sk) connections, not all
 connectections as soon as there is a swapper in the system.
 

You can move the iscsi_session-swapper field to the iscsi_cls_session
and have iscsi_swapdev take a iscsi_cls_session and set that flag.
iscsi_recv_pdu and iscsi_conn_error and all the llds can then access
this bit.

 In order to preserve that information you need extra state, abusing this
 process flags is as good as propagating __GFP_EMERGENCY down the call
 chain with extra gfp_t arguments, perhaps even better, since it will
 make sure we catch all allocations.
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.

2006-09-14 Thread Mike Christie
Mike Christie wrote:
 Peter Zijlstra wrote:
 On Thu, 2006-09-14 at 22:35 +0200, Peter Zijlstra wrote:
 On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
 I thought I found allocations in that path, lemme search...
 found this:

 iscsi_tcp_data_recv()
   iscsi_data_rescv()
 iscsi_complete_pdu()
   __iscsi_complete_pdu()
 iscsi_recv_pdu()
   alloc_skb( GFP_ATOMIC);

 You are right that is for the netlink interface. Could we move the
 PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
 iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
 qla4xxx will have it set when they need it. I will send a patch for this
 along with a way to have the netlink sock vmio set for all iscsi drivers
 that need it.
 I already have such a patch, look at:
 http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch

 but what conditional do you want to use for PF_MEMALLOC, an
 unconditional setting will be highly unpopular.

 Hmm, perhaps you could key it of sk_has_vmio(nls)...
 On second thought, not such a good idea, that will still be too course.
 You only want to force feed stuff originating from
 sk_has_vmio(iscsi_tcp_conn-sock-sk) connections, not all
 connectections as soon as there is a swapper in the system.

 
 You can move the iscsi_session-swapper field to the iscsi_cls_session
 and have iscsi_swapdev take a iscsi_cls_session and set that flag.
 iscsi_recv_pdu and iscsi_conn_error and all the llds can then access
 this bit.
 
 In order to preserve that information you need extra state, abusing this
 process flags is as good as propagating __GFP_EMERGENCY down the call
 chain with extra gfp_t arguments, perhaps even better, since it will
 make sure we catch all allocations.


Oh yeah, on the send side we also allocate some memory for the netlink
interface if there is a connection error (iscsi_conn_failure -
iscsi_conn_error). And when that is called from the transmit side we can
change the GFP_ATOMICs to GFP_NOIOs since we have process context.

So I am just saying we need to set that flag in a couple more places (if
you set it in iscsi_conn_error if iscsi_cls_session-swapper is set then
don't worry about it), and that I need to change iscsi_conn_failure and
iscsi_conn_error to take a gfp_t as an argument (or do a in_interrupt or
something) so we can use GFP_NOIO in the transmit code.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.

2006-09-13 Thread Mike Christie
Peter Zijlstra wrote:
 Implement sht-swapdev() for iSCSI. This method takes care of reserving
 the extra memory needed and marking all relevant sockets with SOCK_VMIO.
 
 When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
 the TCP connect is done with SOCK_VMIO to ensure their success. Also the
 netlink userspace interface is marked SOCK_VMIO, this will ensure that even
 under pressure we can still communicate with the daemon (which runs as
 mlockall() and needs no additional memory to operate).
 
 Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
 present. This ensures that the netlink socket will not block. User-space will
 need to retry failed requests.
 
 The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
 This makes sure we do not block the critical socket, and that we do not
 fail to process incomming data.
 
 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 CC: Mike Christie [EMAIL PROTECTED]
 ---
  drivers/scsi/iscsi_tcp.c|  103 
 +++-
  drivers/scsi/scsi_transport_iscsi.c |   23 +++-
  include/scsi/libiscsi.h |1 
  include/scsi/scsi_transport_iscsi.h |2 
  4 files changed, 113 insertions(+), 16 deletions(-)
 
 Index: linux-2.6/drivers/scsi/iscsi_tcp.c
 ===
 --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
 +++ linux-2.6/drivers/scsi/iscsi_tcp.c
 @@ -42,6 +42,7 @@
  #include scsi/scsi_host.h
  #include scsi/scsi.h
  #include scsi/scsi_transport_iscsi.h
 +#include scsi/scsi_device.h
  
  #include iscsi_tcp.h
  
 @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
   int rc;
   struct iscsi_conn *conn = rd_desc-arg.data;
   struct iscsi_tcp_conn *tcp_conn = conn-dd_data;
 - int processed;
 + int processed = 0;
   char pad[ISCSI_PAD_LEN];
   struct scatterlist sg;
 + unsigned long pflags = current-flags;
 +
 + if (sk_has_vmio(tcp_conn-sock-sk))
 + current-flags |= PF_MEMALLOC;
  

Is this too late or not needed or what is it for? This function gets run
from the network layer's softirq and at this point we have a skbuff with
data that we want to process. The iscsi layer also does not allocate
memory for read or write IO in this path.

I think we would want to set this flag at a lower level. Something
closer to where the skbuf is allocated?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
James Smart wrote:
 
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 
 No, it should not be.
 
 If not, you could do what the iscsi class (for the netdev people this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 
 Well, the real difference is that the payload of the message is actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may

I am not sure I follow. For iscsi, everything after the iscsi_event
struct can be the iscsi request that is to be transmitted. The payload
will not normally be Mbytes but it is not a couple if bytes.

 range in size from a few hundred bytes to several kbytes ( 1 page) to
 Mbyte's in size. Rather than buffer all of this, and push it over the
 socket,
 thus the extra copies - it would best to have the LLDD simply DMA the
 payload like on a typical SCSI command.  Additionally, there will be
 response data that can be several kbytes in length.
 

Once you have got the buffer to the class, the class can create a
scatterlist to DMA from for the LLD. I thought. iscsi does not do this
just because it is software right now. For qla4xxx we do not need
something like what you are talking about (see below for what I was
thinking about for the initiators). If you are saying the extra step of
the copy is plain dumb, I agree, but this happens (you have to suffer
some copy and cannot do dio) for sg io as well in some cases. I think
for the sg driver the copy_*_user is the default.

Instead of netlink for scsi commands and transport requests

For scsi commands could we just use sg io, or is there something special
about the command you want to send? If you can use sg io for scsi
commands, maybe for transport level requests (in my example iscsi pdu)
we could modify something like sg/bsg/block layer scsi_ioctl.c to send
down transport requests to the classes and encapsulate them in some new
struct transport_requests or use the existing struct request but do that
thing people keep taling about using the request/request_queue for
message passing.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
Mike Christie wrote:
 James Smart wrote:
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 No, it should not be.

 If not, you could do what the iscsi class (for the netdev people this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 Well, the real difference is that the payload of the message is actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may
 
 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 
 range in size from a few hundred bytes to several kbytes ( 1 page) to
 Mbyte's in size. Rather than buffer all of this, and push it over the
 socket,
 thus the extra copies - it would best to have the LLDD simply DMA the
 payload like on a typical SCSI command.  Additionally, there will be
 response data that can be several kbytes in length.

 
 Once you have got the buffer to the class, the class can create a
 scatterlist to DMA from for the LLD. I thought. iscsi does not do this
 just because it is software right now. For qla4xxx we do not need

That should be, we do need.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
James Smart wrote:
 Note: We've transitioned off topic. If what this means is there isn't a
 good
 way except by ioctls (which still isn't easily portable) or system calls,
 then that's ok. Then at least we know the limits and can look at other
 implementation alternatives.
 
 Mike Christie wrote:
 James Smart wrote:
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 No, it should not be.

 If not, you could do what the iscsi class (for the netdev people
 this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 Well, the real difference is that the payload of the message is
 actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may

 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 
 True... For a large read/write - it will eventually total what the i/o
 request size was, and you did have to push it through the socekt.
 What this discussion really comes down to is the difference between
 initiator
 offload and what a target does.
 
 The initiator offloads the full i/o from the users - e.g. send command,
 get response. In the initiator case, the user isn't aware of each and
 every IU that makes up the i/o. As it's on an i/o basis, the LLDD doing
 the offload needs the full buffer sitting and ready. DMA is preferred so
 the buffer doesn't have to be consuming socket/kernel/driver buffers while
 it's pending - plus speed.
 
 In the target case, the target controls each IU and it's size, thus it
 only has to have access to as much buffer space as it wants to push the
 next
 IU. The i/o can be paced by the target. Unfortunately, this is an
 entirely
 different use model than users of a scsi initiator expect, and it won't map
 well into replacing things like our sg_io ioctls.


I am not talking about the target here. For the open-iscsi initiator
that is in mainline that I referecnced in the example we send pdus from
userpsace to the LLD. In the future, initaitors that offload some iscsi
processing and will login from userspace or have userspace monitor the
transport by doing iscsi pings, we need to be able to send these pdus.
And the iscsi pdu cannot be broken up at the iscsi level (they can at
the interconect level though). From the iscsi host level they have to go
out like a scsi command would in that the LLD cannot decide to send out
mutiple pdus for he pdu that userspace sends down.

I do agree with you that targets can break down a scsi command into
multiple transport level packets as it sees fit.


 
 Instead of netlink for scsi commands and transport requests

 For scsi commands could we just use sg io, or is there something special
 about the command you want to send? If you can use sg io for scsi
 commands, maybe for transport level requests (in my example iscsi pdu)
 we could modify something like sg/bsg/block layer scsi_ioctl.c to send
 down transport requests to the classes and encapsulate them in some new
 struct transport_requests or use the existing struct request but do that
 thing people keep taling about using the request/request_queue for
 message passing.
 
 Well - there's 2 parts to this answer:
 
 First : IOCTL's are considered dangerous/bad practice and therefore it
 would

Yeah, i am not trying to kill ioctls. I go where the community goes.
What I am trying to dois just reuse the sg io mapping code so that we do
not end up with sg, st, target, blk scsi_ioctl.c and bsg all doing
similar things.


   be nice to find a replacement mechanism that eliminates them. If that
   mechanism has some of the cool features that netlink does, even better.
   Using sg io, in the manner you indicate, wouldn't remove the ioctl use.
   Note: I have OEMs/users that are very confused about the community's
 statement
   about ioctls. They've heard they are bad, should never be allowed,
 will no
   be longer supported, but yet they are at the heart of DM and sg io and
 other
   subsystems. Other than a grandfathered explanation, they don't
 understand
   why the rules bend for one piece of code but not for another. To them,
 all
   the features are just as critical regardless of whose providing them.
 
 Second: transport level i/o could be done like you suggest, and we've
   prototyped some of this as well. However, there's something very wrong
   about putting block device wrappers and settings around something that
   is not a block device.  In general, it's a heck of a lot of overhead and
   still doesn't solve the real issue - how to portably pass that user
 buffer


I am not talking about putting block device wrappers. This the magic
part

Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
Mike Christie wrote:
 James Smart wrote:
 Note: We've transitioned off topic. If what this means is there isn't a
 good
 way except by ioctls (which still isn't easily portable) or system calls,
 then that's ok. Then at least we know the limits and can look at other
 implementation alternatives.

 Mike Christie wrote:
 James Smart wrote:
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 No, it should not be.

 If not, you could do what the iscsi class (for the netdev people
 this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 Well, the real difference is that the payload of the message is
 actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may
 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 True... For a large read/write - it will eventually total what the i/o
 request size was, and you did have to push it through the socekt.
 What this discussion really comes down to is the difference between
 initiator
 offload and what a target does.

 The initiator offloads the full i/o from the users - e.g. send command,
 get response. In the initiator case, the user isn't aware of each and
 every IU that makes up the i/o. As it's on an i/o basis, the LLDD doing
 the offload needs the full buffer sitting and ready. DMA is preferred so
 the buffer doesn't have to be consuming socket/kernel/driver buffers while
 it's pending - plus speed.

 In the target case, the target controls each IU and it's size, thus it
 only has to have access to as much buffer space as it wants to push the
 next
 IU. The i/o can be paced by the target. Unfortunately, this is an
 entirely
 different use model than users of a scsi initiator expect, and it won't map
 well into replacing things like our sg_io ioctls.
 
 
 I am not talking about the target here. For the open-iscsi initiator
 that is in mainline that I referecnced in the example we send pdus from
 userpsace to the LLD. In the future, initaitors that offload some iscsi
 processing and will login from userspace or have userspace monitor the
 transport by doing iscsi pings, we need to be able to send these pdus.
 And the iscsi pdu cannot be broken up at the iscsi level (they can at
 the interconect level though). From the iscsi host level they have to go
 out like a scsi command would in that the LLD cannot decide to send out
 mutiple pdus for he pdu that userspace sends down.
 
 I do agree with you that targets can break down a scsi command into
 multiple transport level packets as it sees fit.
 

Oh yeah is

FC IU == iscsi tcp packet
or
FC IU == iscsi pdu
?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-19 Thread Mike Christie
James Smart wrote:
 Folks,
 
 To take netlink to where we want to use it within the SCSI subsystem (as
 the mechanism of choice to replace ioctls), we're going to need to pass
 user-space buffer pointers.
 
 What is the best, portable manner to pass a pointer between user and kernel
 space within a netlink message ?  The example I've seen is in the iscsi
 target code - and it's passed between user-kernel space as a u64, then
 typecast to a void *, and later within the bio_map_xxx functions, as an
 unsigned long. I assume we are going to continue with this method ?
 

I do not know if it is needed. For the target code, we use the
bio_map_xxx to avoid having to copy the command data which is needed for
decent performance. We have also been trying to figure out ways of
getting out of using netlink to send the command info (cdb, tag info,
etc) around, because in some of Tomo's tests using mmaped packet sockets
he was able to imporove performance by removing that copy from the
kernel to userspace. We had problems with that though and other nice
interfaces like relayfs only allowed us to pass data from the kernel to
userspace so we still need another interface to pass things from
userspace to the kernel. Still working on this though. If someone knows
a interface please let us know.

For the tasks you want to do for the fc class is performance critical?
If not, you could do what the iscsi class (for the netdev people this is
drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
copies. For iscsi we do this in userspace to send down a login pdu:

/*
 * xmitbuf is a buffer that is large enough for the iscsi_event,
 * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
 */
memset(xmitbuf, 0, sizeof(*ev) + hdr_size + data_size);
xmitlen = sizeof(*ev);
ev = xmitbuf;
ev-type = ISCSI_UEVENT_SEND_PDU;
ev-transport_handle = transport_handle;
ev-u.send_pdu.sid = sid;
ev-u.send_pdu.cid = cid;
ev-u.send_pdu.hdr_size = hdr_size;
ev-u.send_pdu.data_size = data_size;

then later we do sendmsg()to send down the xmitbuf to the kernel iscsi
driver. I think there may be issues with packing structs or 32 bit
userspace and 64 bit kernels and other fun things like this so the iscsi
pdu and iscsi event have to be defined correctly and I guess we are back
to some of the problems with ioctls :(
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] scsi tgt: scsi target netlink interface

2006-02-16 Thread Mike Christie
This patch implments a netlink interface for the scsi tgt framework.
I was not sure if code using the netlink interface had to get reviewed
by the netdev guys. I am ccing them on this patch and providing
a basic review of why/how we want to use netlink.

I did not think the netdev people wanted to see the scsi and
block layer code, so I am just sending the netlink interface part
of this patchset to netdev. I can resend the other parts if
needed.

The scsi tgt framework, adds support for scsi target mode
cards. So instead of using the scsi card in your box as a initiator/host
you can use it as a target/server.

The reason for the netlink use is becuase the target normally
receives a interrupt indicating that a command or event is
ready to be processed. The scsi card's driver will then call
a scsi lib function which eventually calls scsi_tgt_uspace_send (in
this patch below) to tell userspace to begin to process the request
(userspace contains the state model). Later userspace will call back
into the kernel by sending a netlink msg, and instruct the scsi driver
what to do next. When the scsi driver is done executing the
operation, it will send a netlink message back to userspace
to indicate the success or failure of the operation (using 
scsi_tgt_uspace_send_status in the patch below).


Signed-off-by: Mike Christie [EMAIL PROTECTED]
Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]


diff --git a/drivers/scsi/scsi_tgt_if.c b/drivers/scsi/scsi_tgt_if.c
new file mode 100644
index 000..38b35da
--- /dev/null
+++ b/drivers/scsi/scsi_tgt_if.c
@@ -0,0 +1,214 @@
+/*
+ * SCSI target kernel/user interface functions
+ *
+ * Copyright (C) 2005 FUJITA Tomonori [EMAIL PROTECTED]
+ * Copyright (C) 2005 Mike Christie [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+#include linux/blkdev.h
+#include linux/file.h
+#include linux/netlink.h
+#include net/tcp.h
+#include scsi/scsi.h
+#include scsi/scsi_cmnd.h
+#include scsi/scsi_device.h
+#include scsi/scsi_host.h
+#include scsi/scsi_tgt.h
+#include scsi/scsi_tgt_if.h
+
+#include scsi_tgt_priv.h
+
+static int tgtd_pid;
+static struct sock *nl_sk;
+
+static int send_event_res(uint16_t type, struct tgt_event *p,
+ void *data, int dlen, gfp_t flags, pid_t pid)
+{
+   struct tgt_event *ev;
+   struct nlmsghdr *nlh;
+   struct sk_buff *skb;
+   uint32_t len;
+
+   len = NLMSG_SPACE(sizeof(*ev) + dlen);
+   skb = alloc_skb(len, flags);
+   if (!skb)
+   return -ENOMEM;
+
+   nlh = __nlmsg_put(skb, pid, 0, type, len - sizeof(*nlh), 0);
+
+   ev = NLMSG_DATA(nlh);
+   memcpy(ev, p, sizeof(*ev));
+   if (dlen)
+   memcpy(ev-data, data, dlen);
+
+   return netlink_unicast(nl_sk, skb, pid, 0);
+}
+
+int scsi_tgt_uspace_send(struct scsi_cmnd *cmd, struct scsi_lun *lun, gfp_t 
gfp_mask)
+{
+   struct Scsi_Host *shost = scsi_tgt_cmd_to_host(cmd);
+   struct sk_buff *skb;
+   struct nlmsghdr *nlh;
+   struct tgt_event *ev;
+   struct tgt_cmd *tcmd;
+   int err, len;
+
+   len = NLMSG_SPACE(sizeof(*ev) + sizeof(struct tgt_cmd));
+   /*
+* TODO: add MAX_COMMAND_SIZE to ev and add mempool
+*/
+   skb = alloc_skb(NLMSG_SPACE(len), gfp_mask);
+   if (!skb)
+   return -ENOMEM;
+
+   nlh = __nlmsg_put(skb, tgtd_pid, 0, TGT_KEVENT_CMD_REQ,
+ len - sizeof(*nlh), 0);
+
+   ev = NLMSG_DATA(nlh);
+   ev-k.cmd_req.host_no = shost-host_no;
+   ev-k.cmd_req.cid = cmd-request-tag;
+   ev-k.cmd_req.data_len = cmd-request_bufflen;
+
+   dprintk(%d %u %u\n, ev-k.cmd_req.host_no, ev-k.cmd_req.cid,
+   ev-k.cmd_req.data_len);
+
+   /* FIXME: we need scsi core to do that. */
+   memcpy(cmd-cmnd, cmd-data_cmnd, MAX_COMMAND_SIZE);
+
+   tcmd = (struct tgt_cmd *) ev-data;
+   memcpy(tcmd-scb, cmd-cmnd, sizeof(tcmd-scb));
+   memcpy(tcmd-lun, lun, sizeof(struct scsi_lun));
+
+   err = netlink_unicast(nl_sk, skb, tgtd_pid, 0);
+   if (err  0)
+   printk(KERN_ERR scsi_tgt_uspace_send: could not send skb %d\n,
+  err);
+   return err;
+}
+
+int scsi_tgt_uspace_send_status(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+{
+   struct Scsi_Host