Re: [openib-general] more comments on cxgb3
On Thu, 2007-02-08 at 08:40 +0200, Michael S. Tsirkin wrote: OK, so I looked at cxgb3 some more. Thanks! To summarise my previous comments, I think the cxio hal layer needs to go to make the code readable - if I understand correctly it is there for historical reasons only. I can do this but its low on the list of todos. I started looking at userspace/kernel interaction, and then went over to other code under cxgb3 (but not core/). - Consider a user that does e.g. create QP, but never calls mmap. Is there some code that will clean out the unclamed mmap object? I couldn't find it, and iwch_dealloc_ucontext does not seem to do anything with it. This is a bug. I've got a fix for it too. - Passing physical address to userspace and back looks suspicios. Especially this: uresp.physaddr = virt_to_phys(chp-cq.queue); Could you elaborate on the design here? What are these phy addresses and how come userspace needs to know the phy address? You are not doing DMA by this address, by any chance? No, Its not used for DMA by the HW. The physaddr is passed up to the user, and the user then mmaps() using that as the offset. I took this code from the ipath driver. It has been pointed out to me that this is broken for 32b userspace on a 64 kernel because mmap2() cannot pass down 64 bits. So I need to rework this code. - It seems that by passing in huge resource sizes, userspace will be able to drink up unlimited amounts of kernel memory. mthca handles this by using the mlock rlimit, should something be done here as well? Hmm. That's a good point. I'll put this on the todo as well. So mthca adds to process's rlimit value as things are allocated out of kernel memory (or maybe even anything that gets pinned)? A couple of comments on PDBG macro. - I'd like to suggest following the practice of prefixing macro names with module name (same goes for functions like get_mhp really) - unless they are local to file. - You are using __FUNCTION__ a lot - it might be to just to kill it, messages are unique so you'll be able to locate the msg source anyway, save some kernel text and logs will be shorter. In any case I think __func__ is the recommended gcc way to get the name currently. - comment near pr_debug definition in include/linux/kernel.h says: /* If you are writing a driver, please use dev_dbg instead */ so it might be a good idea for PDBG to follow this rule. - log messages do not look very informative to me. I also think they are a bit too many of them currently. For example, I do not think it is a good idea to print the kernel pointers out. For example, in code like the following: mhp = get_mhp(rhp, (sg_list[i].lkey) 8); if (!mhp) { PDBG(%s %d\n, __FUNCTION__, __LINE__); return -EIO; } might be better to say MR key XXX does not exist. Exiting.. -EIO also looks like a strange error code to return here, does it not? Maybe something like EINVAL would be more appropriate? I'll take a todo to clean up the debug stuff. - I wonder about the names like get_mhp - what does mhp mean? static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) { return idr_find(rhp-mmidr, mmid); } Memory Handle Pointer. Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed to convey this meaning. - In the following code, what does missing pdid check mean? /* * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. */ This sounds interesting. Does this mean the code does not validate the PD currently? I need firmware support for this. It will be done asap and I can remove this code entirely. I have the same question for: /* TBD: check perms */ in iwch_bind_mw. BTW, does TBD stand for To Be Done here? Yes. Do you mean TODO really? - iwch_sgl2pbl_map is used in several places, and seems a bit too big to be inline Well, it's time to go do my day job now :) Hope this helps, Thanks again Michael! Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] more comments on cxgb3
- It seems that by passing in huge resource sizes, userspace will be able to drink up unlimited amounts of kernel memory. mthca handles this by using the mlock rlimit, should something be done here as well? Hmm. That's a good point. I'll put this on the todo as well. So mthca adds to process's rlimit value as things are allocated out of kernel memory (or maybe even anything that gets pinned)? Yes. The code is actually in uverbs core, mthca uses that. - I wonder about the names like get_mhp - what does mhp mean? static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) { return idr_find(rhp-mmidr, mmid); } Memory Handle Pointer. hmm, what's a Handle? Maybe a better name can be found. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] more comments on cxgb3
- Consider a user that does e.g. create QP, but never calls mmap. Is there some code that will clean out the unclamed mmap object? I couldn't find it, and iwch_dealloc_ucontext does not seem to do anything with it. BTW: Here is my fix for this. - Clean up pending mmaps on ucontext deallocation. From: Steve Wise [EMAIL PROTECTED] Free all pending mmap structs when the ucontext is deallocated. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch_provider.c |1 + drivers/infiniband/hw/cxgb3/iwch_provider.h | 15 +++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index db2b0a8..98568ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct struct iwch_dev *rhp = to_iwch_dev(context-device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); PDBG(%s context %p\n, __FUNCTION__, context); + free_mmaps(ucontext); cxio_release_ucontext(rhp-rdev, ucontext-uctx); kfree(ucontext); return 0; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 1ede8a7..c8c07ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -199,6 +199,21 @@ struct iwch_mm_entry { unsigned len; }; +static inline void free_mmaps(struct iwch_ucontext *ucontext) +{ + struct list_head *pos, *nxt; + struct iwch_mm_entry *mm; + + spin_lock(ucontext-mmap_lock); + list_for_each_safe(pos, nxt, ucontext-mmaps) { + mm = list_entry(pos, struct iwch_mm_entry, entry); + list_del(mm-entry); + kfree(mm); + } + spin_unlock(ucontext-mmap_lock); + return; +} + static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext, u64 addr, unsigned len) { ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] more comments on cxgb3
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index db2b0a8..98568ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct struct iwch_dev *rhp = to_iwch_dev(context-device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); PDBG(%s context %p\n, __FUNCTION__, context); +free_mmaps(ucontext); cxio_release_ucontext(rhp-rdev, ucontext-uctx); kfree(ucontext); return 0; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 1ede8a7..c8c07ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -199,6 +199,21 @@ struct iwch_mm_entry { unsigned len; }; +static inline void free_mmaps(struct iwch_ucontext *ucontext) +{ +struct list_head *pos, *nxt; +struct iwch_mm_entry *mm; + +spin_lock(ucontext-mmap_lock); +list_for_each_safe(pos, nxt, ucontext-mmaps) { +mm = list_entry(pos, struct iwch_mm_entry, entry); +list_del(mm-entry); +kfree(mm); +} +spin_unlock(ucontext-mmap_lock); +return; +} Since you only have one caller, I would suggest just open-coding the deletion at the call-site (since that function is really too big to inline if it ever grows another caller). And I don't think you need the locking either, since there better be no one else looking at the context structure while you're in the process of freeing it. Something like: struct iwch_dev *rhp = to_iwch_dev(context-device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); struct iwch_mm_entry *mm, *tmp; PDBG(%s context %p\n, __FUNCTION__, context); list_for_each_entry_safe(mm, tmp, ucontext-mmaps) kfree(mm); cxio_release_ucontext(rhp-rdev, ucontext-uctx); kfree(ucontext); return 0; - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] more comments on cxgb3
OK, so I looked at cxgb3 some more. To summarise my previous comments, I think the cxio hal layer needs to go to make the code readable - if I understand correctly it is there for historical reasons only. I started looking at userspace/kernel interaction, and then went over to other code under cxgb3 (but not core/). - Consider a user that does e.g. create QP, but never calls mmap. Is there some code that will clean out the unclamed mmap object? I couldn't find it, and iwch_dealloc_ucontext does not seem to do anything with it. - Passing physical address to userspace and back looks suspicios. Especially this: uresp.physaddr = virt_to_phys(chp-cq.queue); Could you elaborate on the design here? What are these phy addresses and how come userspace needs to know the phy address? You are not doing DMA by this address, by any chance? - It seems that by passing in huge resource sizes, userspace will be able to drink up unlimited amounts of kernel memory. mthca handles this by using the mlock rlimit, should something be done here as well? A couple of comments on PDBG macro. - I'd like to suggest following the practice of prefixing macro names with module name (same goes for functions like get_mhp really) - unless they are local to file. - You are using __FUNCTION__ a lot - it might be to just to kill it, messages are unique so you'll be able to locate the msg source anyway, save some kernel text and logs will be shorter. In any case I think __func__ is the recommended gcc way to get the name currently. - comment near pr_debug definition in include/linux/kernel.h says: /* If you are writing a driver, please use dev_dbg instead */ so it might be a good idea for PDBG to follow this rule. - log messages do not look very informative to me. I also think they are a bit too many of them currently. For example, I do not think it is a good idea to print the kernel pointers out. For example, in code like the following: mhp = get_mhp(rhp, (sg_list[i].lkey) 8); if (!mhp) { PDBG(%s %d\n, __FUNCTION__, __LINE__); return -EIO; } might be better to say MR key XXX does not exist. Exiting.. -EIO also looks like a strange error code to return here, does it not? Maybe something like EINVAL would be more appropriate? - I wonder about the names like get_mhp - what does mhp mean? static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) { return idr_find(rhp-mmidr, mmid); } Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed to convey this meaning. - In the following code, what does missing pdid check mean? /* * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. */ This sounds interesting. Does this mean the code does not validate the PD currently? I have the same question for: /* TBD: check perms */ in iwch_bind_mw. BTW, does TBD stand for To Be Done here? google says: Definitions of TBD on the Web: * To Be Determined, Defined, Decided. www.csr.com/ptot.htm * to be determined www.liberalsagainstterrorism.com/wiki/index.php/Counterinsurgency_Operations/Glossary * Treasury Board (Secretariat) www.psc-cfp.gc.ca/centres/definitions_and_notes_e.htm * The three letter abbreviation TBD may be/mean, depending on context: * an acronym for To Be Determined (...at a later point in time., typically)* the Douglas Devastator, a US Navy torpedo bomber of World War II en.wikipedia.org/wiki/TBD What is to be determined here? Do you mean TODO really? - iwch_sgl2pbl_map is used in several places, and seems a bit too big to be inline Well, it's time to go do my day job now :) Hope this helps, -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general