Hi Syed, sounds like a very doable solution, I'll try that.
Appreciate your suggestions :) Best, -elfar On Sat, Jan 30, 2010 at 4:52 PM, Syed Haider <[email protected]> wrote: > Hi Elfar, > > the following may be need writing few more lines of code but will work with > your existing workflows. What you may consider doing is to first retrieve > all the gene ids or transcript ids depending upon which sequence type you > are interested in. You can do this either from web interface or your script. > Once you have these, split them into smaller groups, say 1000 each, and then > send multiple queries with these ids as filter values. > > Hope this will do the trick. > > Best, > Syed > > > Elfar Torarinsson wrote: >> >> Hi Sayed, >> >> thanks for your answer. I have couple of issues with that solution. >> First of all I have often experienced that this feature fails, that is >> I never receive the mail, especially while requesting large amount of >> data. The other thing is that I wanted to be able to do this >> automatically, in a cronjob for example, and although I assume this is >> possible, it will require somewhat more scripting than I was planning >> on doing for this (unless there is some smart option here I'm >> overlooking). >> >> Best, >> >> Elfar >> >> >> On Sat, Jan 30, 2010 at 3:47 PM, Syed Haider <[email protected]> >> wrote: >>> >>> Hi Elfar, >>> >>> the best is to download them using web browser's Export (email option). >>> This >>> will compile the results on server side and then send you a link in >>> email. >>> >>> Best, >>> Syed >>> >>> >>> Elfar Torarinsson wrote: >>>> >>>> Hi, >>>> >>>> I was trying to automate regular downloads of human CDS (and UTRs) >>>> using biomart. I have tried it using the perl script generated at >>>> biomart: >>>> >>>> use strict; >>>> use BioMart::Initializer; >>>> use BioMart::Query; >>>> use BioMart::QueryRunner; >>>> >>>> my $confFile = >>>> "/home/projects/ensembl/biomart-perl/conf/apiExampleRegistry.xml"; >>>> my $action='cached'; >>>> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, >>>> 'action'=>$action); >>>> my $registry = $initializer->getRegistry; >>>> >>>> my $query = >>>> >>>> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default'); >>>> >>>> $query->setDataset("hsapiens_gene_ensembl"); >>>> $query->addAttribute("ensembl_gene_id"); >>>> $query->addAttribute("ensembl_transcript_id"); >>>> $query->addAttribute("coding"); >>>> $query->addAttribute("external_gene_id"); >>>> >>>> $query->formatter("FASTA"); >>>> >>>> my $query_runner = BioMart::QueryRunner->new(); >>>> # to obtain unique rows only >>>> $query_runner->uniqueRowsOnly(1); >>>> >>>> $query_runner->execute($query); >>>> $query_runner->printHeader(); >>>> $query_runner->printResults(); >>>> $query_runner->printFooter(); >>>> >>>> This only retrieves a few sequences and then starts returning >>>> "Problems with the web server: 500 read timeout" >>>> >>>> I have also tried posting the XML using LWP in perl, this downloads >>>> more sequences but this also stops after a while before downloading >>>> all the sequences: >>>> >>>> use strict; >>>> use LWP::UserAgent; >>>> open (FH,$ARGV[0]) || die ("\nUsage: perl postXML.pl Query.xml\n\n"); >>>> my $xml; >>>> while (<FH>){ >>>> $xml .= $_; >>>> } >>>> close(FH); >>>> >>>> my $path="http://www.biomart.org/biomart/martservice?"; >>>> my $request = >>>> >>>> HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n"); >>>> my $ua = LWP::UserAgent->new; >>>> $ua->timeout(30000000); >>>> my $response; >>>> >>>> $ua->request($request, >>>> sub{ >>>> my($data, $response) = @_; >>>> if ($response->is_success) { >>>> print "$data"; >>>> } >>>> else { >>>> warn ("Problems with the web server: >>>> ".$response->status_line); >>>> } >>>> },500); >>>> >>>> I have managed to download all the sequences using the browser before, >>>> but, it required several tries and I had to get them gzipped (also so >>>> I could be sure I got all of them when gunzipping them). >>>> >>>> So, my question is, is there anything I can do to be able to download >>>> all the sequences? I.e. avoid timeouts, some easy, systematic, way to >>>> split my calls into much smaller calls or something else? >>>> >>>> Thanks, >>>> >>>> Elfar >
