Hi Chris

Christopher Dwan wrote:

I've asked this here before, but I'm hoping that there have been some changes or updates and I missed the email:

What's the state of the art in terms of configuring permissions on BLAST target files (specifically the MBF files) so that multiple users can run searches against targets in a shared directory, make use of cached index and sequence files, and not encounter the dreaded:

Error opening /common/data/nt.mbf
open: Permission denied
Fatal Error:

This is very hard as you do not know a-priori if

a) two mpiblast jobs are from users in the same group
b) two mpiblast jobs using a database named "qwerty" actually want to use the same database. You would need to store hash signatures per db-chunk.

 From perusing the list archives I see the following options:
------------------------------------------------------------------------ -------- * Make a custom BLAST target directory for every user (not enough disk space, doesn't scale)

This is the only solution I know that will work. Of course the other issue is if someone starts mpiblasting a really huge db, then you have effectively denied service on the compute nodes to other jobs due to the disk cache issue.


* Hack MPIBlast code to change the file permissions on the MBF files (seems reasonable, but there must be some reason why it's not already done. Probably concerns about users stepping on each other's jobs).

This is something I ran into ~3 years ago with early versions. Users would stamp all over others runs, and complain that the cluster was broken.

There is no satisfactory solution other than isolation. Isolation effectively defeats caching. If we break out the file distribution (or more accurately, the chunk distribution) from mpiblast, you may be able to have another lower layer work on it, but that layer would need to be designed.

An alternative (but a very bad one for a number of reasons) is to have all users have the same group, and give every one group read/write permission into the cache directory. You want to avoid any scenario which does this due to the potential for users to step on each other, and the complete lack of access control that such a scheme requires to function.

* Install pre and post scripts in my DRM to erase cached files after each job is done (why have a cache at all?)

for long queries.  Speeds the local access.  Caching per job.


Is there another option that I'm missing here? How do folks handle this at large-ish installations with multiple users who each may run mpiblast jobs against shared targets?

Separate cache locations per user. Really fast access to shared fs. Really fast and large local fs.


-Chris Dwan

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to