Hi Guy,

sorry for not keeping you in the row since I assumed all interested
people are reading the Debian Med mailing list.  Currently an outreachy
student does some QA work on Debian Med packages trying to add test
suites.  Please read below about the problems with packages created by
rostlab.  Your comments would be very helpful.

Kind regards

        Andreas.

On Wed, Jul 13, 2016 at 09:21:20PM +0200, Andreas Tille wrote:
> Hi Tanya,
> 
> On Wed, Jul 13, 2016 at 07:24:12PM +0300, merlettaia wrote:
> > 
> > I found a problem in which this package is involved also.
> > Last weekend I started to work on predictprotein. The hardest problem was
> > to make it work.
> > https://wiki.debian.org/DebianMed/PredictProtein - at some point I found
> > this instruction, spent some time downloading database, and when I
> > downloaded and installed it, then run predictprotein, I've got multilple
> > error messages (output_with_errors.txt). It turned out that when one of the
> > perl scripts in librg-utils-perl calls blastpgp on that database,
> >   blastpgp -F F -a 1 -j 3 -b 3000 -e 1 -h 1e-3 -d
> > /data/src/rostlab-data/data/big/big_80 -i query.fasta -o
> > query.blastPsiOutTmp -C query.chk -Q query.blastPsiMat
> > 
> > - blastpgp ends up with "Killed" message, and produces incorrect output
> > file (query.blastPsiOutTmp is incomplete). Script in librg-utils-perl is
> > correct, call in predictprotein is correct. Blastpgp fails with error.
> > 
> > I thought that incorrect database format could be the reason for it.
> > Because version of ncbi-blast+ (blastpgp belongs to this package) package
> > uses latest version of that database, and database from RostLab's website
> > probably isn't latest.
> > I downloaded from NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) one of
> > the databases, and tried to run predictprotein with that data. It worked!
> > But now I've got error while metastudent run (output in some_output.txt) -
> > I'm working to fix it now.
> 
> Thanks for your very thorough investigation.  I have put Laszlo in CC -
> may be he has some contact information or can help himself even if he
> is not active in Debian Med any more.
>  
> > And there are two things I don't understand:
> > 
> > Is there any package which contains copy of current version of blastp
> > database? Or small part of it. It seems that autopkgtest testsuite should
> > use smaller portion of blastp database.
> 
> As far as I know there is no such package.  IMHO it might be a good idea
> to ship something like a stripped down database since it could be used
> as test data input for several other packages.  What do other think?
> 
> > For now it seems unclear how to test predictprotein with autopkgtest, since
> > for correct run it requires also local copy of (possibly) huge database
> > (~30GB in copy from RostLab's website), probably ncbi-blast+/ncbi-tools6
> > should download and install it?
> 
> For manual user tests this might be OK, but autopkgtest should be
> offline.
> 
> > Predictprotein has special parameters for
> > different databases, and path to blast installation can be provided by
> > hand, that makes possible to call it with smaller database in testsuite
> > run.
> 
> Sounds convincing.
> 
> > But that will work only if blastpgp from ncbi-blast+ works correctly
> > with the same version of database. That means that better way to
> > install+test database usage from ncbi-blast+ tests, and use default
> > database installed with ncbi-blast+ (if it will be installed).
> > 
> > Could you also check that database from here:
> > https://wiki.debian.org/DebianMed/PredictProtein - really doesn't work? I
> > have unstable internet connection and not sure if that file was not
> > corrupted.
> 
> Any volunteer for this?  My internet is currently also not the best.
> 
> Kind regards
> 
>        Andreas. 
> 
> 
> > cache merging is off at /usr/bin/predictprotein line 230.
> > work_dir=/data/src/temp at /usr/bin/predictprotein line 336.
> > make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 
> > BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ 
> > PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 
> > PROFROOT=/usr/share/profphd/prof/ 
> > BIGBLASTDB=/data/src/rostlab-data/data/aa/pdbaa 
> > BIG80BLASTDB=/data/src/rostlab-data/data/aa/pdbaa 
> > PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls 
> > PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm 
> > PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat 
> > PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat 
> > PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl 
> > SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt 
> > SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot 
> > NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk 
> > all norsp at /usr/bin/predictprotein line 383.
> > make: Entering directory '/data/src/temp'
> > metastudent -i query.fasta -o query.metastudent --silent  --debug
> > mkdir -p /tmp/metastudentulQjHj/methodC;cd 
> > /usr/lib/python2.7/dist-packages/metastudentPkg/lib/groupC;./CafaWrapper3.pl
> >  /tmp/metastudentulQjHj/query.fasta_eval1.0_iters3_srcgoasp.mfo.blast 
> > /tmp/metastudentulQjHj/methodC/output.MFO.txt 0 
> > /tmp/metastudentulQjHj/methodC
> > !!!Error!!! mkdir -p /tmp/metastudentulQjHj/methodC;cd 
> > /usr/lib/python2.7/dist-packages/metastudentPkg/lib/groupC;./CafaWrapper3.pl
> >  /tmp/metastudentulQjHj/query.fasta_eval1.0_iters3_srcgoasp.mfo.blast 
> > /tmp/metastudentulQjHj/methodC/output.MFO.txt 0 
> > /tmp/metastudentulQjHj/methodC
> > 65280
> > Can't use a hash as a reference at /usr/share/perl5/GO/IO/Dotty.pm line 104.
> > Compilation failed in require at ./treehandler.pl line 10.
> > BEGIN failed--compilation aborted at ./treehandler.pl line 10.
> > ./treehandler.pl -mfo transitiveClosure2014.txt -bpo 
> > transitiveClosure2014.txt -cco transitiveClosure2014.txt -method 3 -pred 
> > /tmp/metastudentulQjHj/methodC/blast.out -scoring 0 failed: 255 at 
> > ./CafaWrapper3.pl line 16.
> > Error occurred: IOError
> > Traceback (most recent call last):
> >   File "/usr/bin/metastudent", line 721, in <module>
> >     runIt(tempfile, inputFastaFilePath, outputFilePath, outputBlast, 
> > blastKickstartDatabasePaths, ontologies, blastOnly, keepTemp, allPreds, 
> > debug, noNames, withImages)
> >   File "/usr/bin/metastudent", line 187, in runIt
> >     predLinesDict["C"] = runMethodC(blastKickstartDatabasePath, 
> > fastaFilePathLocal, tmpDirPath, configMap["GROUP_C_SCORING_%s" % (ontology) 
> > ], ontology, configMap, debug)
> >   File "/usr/lib/python2.7/dist-packages/metastudentPkg/runMethods.py", 
> > line 206, in runMethodC
> >     with open(outputFilePath) as f:                                         
> >         
> > IOError: [Errno 2] No such file or directory: 
> > '/tmp/metastudentulQjHj/methodC/output.MFO.txt'
> > /usr/share/predictprotein/MakefilePP.mk:403: recipe for target 
> > 'query.metastudent.BPO.txt' failed
> > make: *** [query.metastudent.BPO.txt] Error 1
> > make: Leaving directory '/data/src/temp'
> > make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 
> > BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ 
> > PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 
> > PROFROOT=/usr/share/profphd/prof/ 
> > BIGBLASTDB=/data/src/rostlab-data/data/aa/pdbaa 
> > BIG80BLASTDB=/data/src/rostlab-data/data/aa/pdbaa 
> > PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls 
> > PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm 
> > PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat 
> > PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat 
> > PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl 
> > SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt 
> > SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot 
> > NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk 
> > all norsp failed: 512 at /usr/bin/predictprotein line 392.
> 
> > cache merging is off at /usr/bin/predictprotein line 230.
> > work_dir=/data/src/temp at /usr/bin/predictprotein line 336.
> > make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 
> > BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ 
> > PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 
> > PROFROOT=/usr/share/profphd/prof/ 
> > BIGBLASTDB=/data/src/rostlab-data/data/big/big 
> > BIG80BLASTDB=/data/src/rostlab-data/data/big/big_80 
> > PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls 
> > PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm 
> > PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat 
> > PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat 
> > PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl 
> > SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt 
> > SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot 
> > NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk 
> > all norsp at /usr/bin/predictprotein line 383.
> > make: Entering directory '/data/src/temp'
> > make: Warning: File 'query.in' has modification time 3.2 s in the future
> > /usr/share/librg-utils-perl//copf.pl query.in formatIn=fasta 
> > formatOut=fasta fileOut=query.fasta exeConvertSeq=convert_seq
> > /usr/share/librg-utils-perl//copf.pl query.in formatIn=fasta formatOut=gcg 
> > fileOut=query.seqGCG exeConvertSeq=convert_seq
> > ncbi-seg query.fasta -x > query.segNorm
> > /usr/share/librg-utils-perl//copf.pl query.segNorm formatOut=gcg 
> > fileOut=query.segNormGCG
> > # blast call may throw warnings on STDERR - silence it when we are not in 
> > debug mode; blastpgp and blastall create a normally 0-sized 'error.log' - 
> > remove it
> > trap "rm -f error.log" EXIT; \
> > if ! ( blastpgp -F F -a 1 -j 3 -b 3000 -e 1 -h 1e-3 -d 
> > /data/src/rostlab-data/data/big/big_80 -i query.fasta -o 
> > query.blastPsiOutTmp -C query.chk -Q query.blastPsiMat   ); then \
> >     EXIT=$?; cat error.log >&2; exit $EXIT; \
> > fi
> > Killed
> > cat: error.log: No such file or directory
> > # blast call may throw warnings on STDERR - silence it when we are not in 
> > debug mode
> > trap "rm -f error.log" EXIT; \
> > if ! ( blastpgp -F F -a 1 -b 1000 -e 1 -d 
> > /data/src/rostlab-data/data/big/big -i query.fasta -o query.blastPsiAli.nz 
> > -R query.chk   ); then \
> >     EXIT=$?; cat error.log >&2; exit $EXIT; \
> > fi
> > [blastpgp] WARNING: -t larger than 1 not supported when restarting from a 
> > checkpoint; setting -t to 1
> > 
> > [blastpgp] WARNING: posReadCheckpoint: Attempting to recover data from 
> > previous checkpoint
> > 
> > [blastpgp] WARNING: posReadPosFreqsStandard: Could not open checkpoint file
> > 
> > [blastpgp] WARNING: posReadCheckpoint: Data recovery failed
> > 
> > [blastpgp] FATAL ERROR: blast: Error recovering from checkpoint
> > cat: error.log: No such file or directory
> > gzip -c -6 < 'query.blastPsiAli.nz' > 'query.blastPsiAli.gz'
> > # lkajan: we have to switch off filtering (default for blastpgp) or 
> > sequences like ASDSADADASDASDASDSADASA fail with
> > # 'WARNING: query: Could not calculate ungapped Karlin-Altschul parameters 
> > due to an invalid query sequence or its translation. Please verify the 
> > query sequence(s) and/or filtering options'
> > # Does switching off filtering hurt us? Loctree uses the results of this 
> > for extracting keywords from swissprot, so I am not worried.
> > # This blast call also often writes 'Selenocysteine (U) at position 59 
> > replaced by X' - we are not really interested. Silence this in non-debug 
> > mode.
> > trap "rm -f error.log" EXIT; \
> > if ! ( blastall -F F -a 1 -p blastp -d 
> > /data/src/rostlab-data/data/swissprot/uniprot_sprot -b 1000 -e 100 -m 8 -i 
> > query.fasta -o query.blastpSwissM8   ); then \
> >     EXIT=$?; cat error.log >&2; exit $EXIT; \
> > fi
> > /usr/share/librg-utils-perl//blastpgp_to_saf.pl 
> > fileInBlast=query.blastPsiOutTmp fileInQuery=query.fasta  
> > fileOutRdb=query.blastPsi80Rdb fileOutSaf=query.safBlastPsi80 red=100 
> > maxAli=3000 tile=0
> > opened query.fasta at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 
> > 126.
> > blastfile: query.blastPsiOutTmp at 
> > /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 127.
> > nohits: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 128.
> > iter: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 129.
> > blast+: 0 at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 130.
> > Died at /usr/share/librg-utils-perl//blastpgp_to_saf.pl line 76.
> > *** ERROR blastpgp_to_saf.pl : *** ERROR blastp_to_saf: blast file format 
> > not recognized
> > /usr/share/predictprotein/MakefilePP.mk:465: recipe for target 
> > 'query.safBlastPsi80' failed
> > make: *** [query.safBlastPsi80] Error 255
> > rm query.blastPsi80Rdb query.blastPsiAli.nz
> > make: Leaving directory '/data/src/temp'
> > make --no-builtin-rules INFILE=query.in -C /data/src/temp JOBID=query -j 1 
> > BLASTCORES=1 LIBRGUTILS=/usr/share/librg-utils-perl/ 
> > PPROOT=/usr/share/predictprotein/ PROFNUMRESMIN=17 
> > PROFROOT=/usr/share/profphd/prof/ 
> > BIGBLASTDB=/data/src/rostlab-data/data/big/big 
> > BIG80BLASTDB=/data/src/rostlab-data/data/big/big_80 
> > PFAM2DB=/data/src/rostlab-data/data/pfam_legacy/Pfam_ls 
> > PFAM3DB=/data/src/rostlab-data/data/pfam/Pfam-A.hmm 
> > PROSITEDAT=/data/src/rostlab-data/data/prosite/prosite.dat 
> > PROSITECONVDAT=/data/src/rostlab-data/data/prosite/prosite_convert.dat 
> > PSICEXE=/usr/share/rost-runpsic/runNewPSIC.pl 
> > SPKEYIDX=/data/src/rostlab-data/data/swissprot/keyindex_loctree.txt 
> > SWISSBLASTDB=/data/src/rostlab-data/data/swissprot/uniprot_sprot 
> > NORSPCTRL="--win=100" DEBUG=1 -f /usr/share/predictprotein/MakefilePP.mk 
> > all norsp failed: 512 at /usr/bin/predictprotein line 392.
> 
> 
> -- 
> http://fam-tille.de
> 
> 

-- 
http://fam-tille.de

Reply via email to