Hi Olivier and Aaron, I've been playing around with BLAST+, trying to tackle one fairly simple but unimportant issue and one more complex and problematic issue.
The easy one first: In the build log (on Launchpad) I see that during the test phase of the build there are various attempts to connect to the NCBI servers - starting at: ====================================================================== blast_services_unit_test ====================================================================== Running 23 test cases... Error: (311.22) SOCK#1000[?]: [SOCK::Connect] Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov") Error: (303.7) [URL_Connect] Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown Error: (311.22) SOCK#2000[?]: [SOCK::Connect] Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov") Error: (303.7) [URL_Connect] Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown Error: (311.22) SOCK#3000[?]: [SOCK::Connect] Failed SOCK_gethostbyname("www.ncbi.nlm.nih.gov") Error: (303.7) [URL_Connect] Socket connect to www.ncbi.nlm.nih.gov:80 failed: Unknown Error: (310.5) [blast4] Cannot locate server Error: (315.1) Cannot connect to service "blast4" Error: (315.2) CConn_Streambuf::CConn_Streambuf(): NULL connector (UNKNOWN): Unknown ...etc. These don't break the build but they should really be disabled. Is there an easy way to do this, do you think? I've had a poke around in the Makefiles but it's fairly cryptic. Anyway, the complex issue: A user reported that his analysis took an order of magnitude longer after upgrading BLAST+ (from the static binary build to the Debian Med build). I'd expect some slowdown with dynamic linking but this is indeed fairly drastic: Static (downloaded from NCBI): tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do ~/tings/ncbi-blast-2.2.25+/bin/blastx -h > /dev/null ; done' 0.76user 0.29system 0:00.94elapsed 110%CPU (0avgtext+0avgdata 39728maxresident)k 32inputs+0outputs (2major+133193minor)pagefaults 0swaps Dynamic (built with debuild): tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do c++/BUILD/bin/blastx -h > /dev/null ; done' 3.91user 8.91system 0:13.00elapsed 98%CPU (0avgtext+0avgdata 827376maxresident)k 0inputs+0outputs (0major+2623550minor)pagefaults 0swaps So assuming that printing the help message is trivial, and essentially a no-op, the Debian build is taking more than a quarter of a second to fire up. For scripts that call BLAST in a tight loop on small sequences this is a drastic slowdown - nearly all the analysis time is actually used up just starting BLAST. For comparison, I tried timing Perl: tbooth@barsukas[latest]time bash -c 'for (( c=1; c<=50; c++ )) ; do perl -h > /dev/null ; done' 0.36user 0.17system 0:00.21elapsed 244%CPU (0avgtext+0avgdata 6064maxresident)k 0inputs+0outputs (0major+27317minor)pagefaults 0swaps I know Perl is well optimised, but this is still a massive disparity. I wondered if there was a way to speed up linking, so I had a play with 'prelink', but I realised this just helps starting the program the first time in the loop. After that the linking data is all cached anyway. Then I tried mashing all the .so files created by the build into one "libncbiblast_all.so" and linking to this. It compiled and ran but made no difference whatsoever to startup time of blastx. So, maybe I'm barking up the wrong tree and something other than the linking is causing the delay, or maybe there is just no way to get the speedup other than statically linking the binaries. (I know I can't be the first person to try all this but I can't find any previous discussion/documentation). If the latter, I know the real fix is for script authors use BLAST more sensibly, but I'm wondering if there is any mileage in trying to make a ncbi-blast+-static package? This would build from the same source, and replace (dpkg-divert) the main binaries with static versions to give a quick-fix speedup at the cost of a big hunk of disk space. I've not actually tried making this yet, but what do you think? Cheers, TIM -- To Err is human. To Arrr is Pirate! -- To UNSUBSCRIBE, email to debian-med-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1312304035.24579.146.camel@barsukas