Short:
Aaron et al, if I think I've found a bug in the NCBI toolbox what channels are appropriate?



Long:

I accept that I'm the only person on the planet who uses this. Thats cool, even a bit exciting. But there is definitely a bug, and as I persue this its helpful to document the hunt, and see if anything I encounter rings a bell for others.

In the last installment:
Michael Cariaso wrote:
> I'm starting to suspect that the ncbi patch does not alter a code path
> related to -l gi filtering.

I'm now quite specific about where the problem occurs, but it looks like it may be a pure NCBI bug. It seems processing the GIs is being done inside a loop unnecessarily.





The problem only occurs when using the -l switch. Mpiblast quickly enters a loop which is using all cpu on the master node, and leaving the workers idle. The loop will eventually terminate, but takes so long it makes mpiblast useless. This did not occur with mpiblast 1.3, but does happen under mpiblast 1.4.


I don't have any debuging tools suitable for mpi work. But I'm getting good milage from adding some lines like:

fprintf(stderr, "%s:%d %d message\n", __FILE__, __LINE__, time(NULL));



Here is some debugging info I produced. The blank lines were added after the fact, and indicate places where significant time is being consumed. My line numbers will be slightly larger than yours due to the fprintf()s shifting things down. All files mentioned are in ncbi/tools/.


blastutl.c:4917 1136076574 BioseqBlastEngineWithCallbackMult()
blastutl.c:4959 1136076574 : BioseqBlastEngineByLocWithCallbackMult
blastutl.c:5051 1136076574 : not an rps blast
blast.c:5873 1136076574 : BLASTSetUpSearchByLocWithReadDbEx
blast.c:5721 1136076574 start of BLASTSetUpSearchWithReadDbInternalMult()
blast.c:5776 1136076574 call BlastProcessGiLists
blastool.c:2791 1136076574 : begin BlastProcessGiLists()
blastool.c:2815 1136076574 start GetGisFromFile()

blastool.c:2818 1136076578 end GetGisFromFile()
blastool.c:2824 1136076578 AGAIN no bglp?!
blastool.c:2875 1136076578 no options->gilist?
blastool.c:2878 1136076578
blastool.c:2546 1136076578 begin BlastCreateVirtualOIDList
blastool.c:2569 1136076578 begin slow section1 (total=3649718) iscalculated=0

blastool.c:2584 1136076618 end slow section 1 real_ngis=3649718
blastool.c:2743 1136076619 sort gi list

blastool.c:2748 1136076629
blastool.c:2759 1136076630 end createvirtualoidlist()
blastool.c:2883 1136076630 end BlastProcessGiLists()
blast.c:5778 1136076630 back from BlastProcessGiLists

blast.c:5830 1136076632 exiting from BLASTSetUpSearchWithReadDbInternalMult()




That block is simply repeated over and over. It shows the stack that gets me to the error is:


blastutl.c:BioseqBlastEngineWithCallbackMult()
blastutl.c:BioseqBlastEngineByLocWithCallbackMult()
blast.c   :BLASTSetUpSearchByLocWithReadDbEx()
blast.c   :BLASTSetUpSearchWithReadDbInternalMult()
blastool.c:BlastProcessGiLists()

Its here in blastool.c:BlastProcessGiLists() that the time is consumed. What its doing seems quite reasonable, but should probably only need to be done once.


The problem probably occurs here:


if (options->gifile) {
    if ((tmp_list = GetGisFromFile(options->gifile, &ngis))) {
        if (bglp) {
            bglp_tmp = IntersectBlastGiLists(tmp_list, ngis,
                        bglp->gi_list, bglp->total);
        } else {
            bglp_tmp = CombineDoubleInt4Lists(tmp_list, ngis, NULL, 0);
        }


        BlastGiListDestruct(bglp, TRUE);
        bglp = bglp_tmp;
        bglp_tmp = NULL;
        tmp_list = (BlastDoubleInt4Ptr) MemFree(tmp_list);
        ngis = 0;
    }
}




With my extra fprintf to show where the error messages come from it looks like this:

if (options->gifile) {
     fprintf(stderr, "%s:%d %d start GetGisFromFile()\n",
                         __FILE__, __LINE__, time(NULL));

    if ((tmp_list = GetGisFromFile(options->gifile, &ngis))) {
        fprintf(stderr, "%s:%d %d end GetGisFromFile()\n",
                          __FILE__, __LINE__, time(NULL));
        if (bglp) {
            fprintf(stderr, "%s:%d %d NEVER has bglp?!\n",
                          __FILE__, __LINE__, time(NULL));
            bglp_tmp = IntersectBlastGiLists(tmp_list, ngis,
                        bglp->gi_list, bglp->total);
        } else {
            fprintf(stderr, "%s:%d %d AGAIN no bglp?!\n",
                         __FILE__, __LINE__, time(NULL));
            bglp_tmp = CombineDoubleInt4Lists(tmp_list, ngis, NULL, 0);
        }


        BlastGiListDestruct(bglp, TRUE);
        bglp = bglp_tmp;
        bglp_tmp = NULL;
        tmp_list = (BlastDoubleInt4Ptr) MemFree(tmp_list);
        ngis = 0;
    }
}





The problem may be as simple as this:

options->gifile is always true, as it should be. But we only need to read the GIs once. The current NCBI code doesn't recognize this. Perhaps they should have been persisted via the gi_list argument? If so I don't see any code related to that. Either it needs to be modified so that it is read only once, or this code shouldn't be called, and some other code path should be in its place.


If I am seeing a problem with the toolbox,
1) I need to verify this outside of mpiblast, in the standard blast version.

2) I need contact with the folks at NCBI. Aside from the helpdesk, perhaps Aaron and others on this list have more suitable channels?

3) Is there a more suitable discussion forum for others who are deep in the ncbi toolbox?


































-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to