Hi All,

Here is our use case, and we are using the BMI code in OrangeFS 2.8.6:

1.       Malloc a buffer and write something to it.

2.       Send the buffer using BMI routines.

3.       Free the buffer and goto step 1

And here we found the following problem:
We found that the messages captured using ibdump is corrupt in iteration 2. It 
became a mixture of data from iteration 1 and iteration 2.

Here is some analysis we did:
We noticed that when the data corruption occurs, the buffer always point to the 
same virtual address. And after checking with the BMI code, we found that the 
memcache code will keep the buffer registered(to ibverbs) and use virtual 
address to determine whether a registered buffer could be reused or not later.

However I think the memcache shouldn't keep the buffer registered, because that 
the user might free this buffer, and when the user did free and re-allocate the 
buffer, there might be a false match which might lead to data corruption.

So at first, we tested the code with "define ENABLE_MEMCACHE 0" to disable the 
memcache. And then the test passed, so it is proven that the data corruption is 
caused by memcache. However, performance will be affected if the memcache is 
disabled completely.

Finally we formatted the attached patch to solve the problem. It fixes the 
broken code in the clauses when memcache is disabled. And it deregister the 
buffer whenever its use-count drops to 0 and register it when it is used again.

Please feel free to share your thoughts and comments. Thank you very much.

Best Regards,
Jingwang.

Attachment: fix_memcache_issue.patch
Description: fix_memcache_issue.patch

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to