Hi Sayed, thanks for your answer. I have couple of issues with that solution. First of all I have often experienced that this feature fails, that is I never receive the mail, especially while requesting large amount of data. The other thing is that I wanted to be able to do this automatically, in a cronjob for example, and although I assume this is possible, it will require somewhat more scripting than I was planning on doing for this (unless there is some smart option here I'm overlooking).
Best, Elfar On Sat, Jan 30, 2010 at 3:47 PM, Syed Haider <[email protected]> wrote: > Hi Elfar, > > the best is to download them using web browser's Export (email option). This > will compile the results on server side and then send you a link in email. > > Best, > Syed > > > Elfar Torarinsson wrote: >> >> Hi, >> >> I was trying to automate regular downloads of human CDS (and UTRs) >> using biomart. I have tried it using the perl script generated at >> biomart: >> >> use strict; >> use BioMart::Initializer; >> use BioMart::Query; >> use BioMart::QueryRunner; >> >> my $confFile = >> "/home/projects/ensembl/biomart-perl/conf/apiExampleRegistry.xml"; >> my $action='cached'; >> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, >> 'action'=>$action); >> my $registry = $initializer->getRegistry; >> >> my $query = >> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default'); >> >> $query->setDataset("hsapiens_gene_ensembl"); >> $query->addAttribute("ensembl_gene_id"); >> $query->addAttribute("ensembl_transcript_id"); >> $query->addAttribute("coding"); >> $query->addAttribute("external_gene_id"); >> >> $query->formatter("FASTA"); >> >> my $query_runner = BioMart::QueryRunner->new(); >> # to obtain unique rows only >> $query_runner->uniqueRowsOnly(1); >> >> $query_runner->execute($query); >> $query_runner->printHeader(); >> $query_runner->printResults(); >> $query_runner->printFooter(); >> >> This only retrieves a few sequences and then starts returning >> "Problems with the web server: 500 read timeout" >> >> I have also tried posting the XML using LWP in perl, this downloads >> more sequences but this also stops after a while before downloading >> all the sequences: >> >> use strict; >> use LWP::UserAgent; >> open (FH,$ARGV[0]) || die ("\nUsage: perl postXML.pl Query.xml\n\n"); >> my $xml; >> while (<FH>){ >> $xml .= $_; >> } >> close(FH); >> >> my $path="http://www.biomart.org/biomart/martservice?"; >> my $request = >> HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n"); >> my $ua = LWP::UserAgent->new; >> $ua->timeout(30000000); >> my $response; >> >> $ua->request($request, >> sub{ >> my($data, $response) = @_; >> if ($response->is_success) { >> print "$data"; >> } >> else { >> warn ("Problems with the web server: >> ".$response->status_line); >> } >> },500); >> >> I have managed to download all the sequences using the browser before, >> but, it required several tries and I had to get them gzipped (also so >> I could be sure I got all of them when gunzipping them). >> >> So, my question is, is there anything I can do to be able to download >> all the sequences? I.e. avoid timeouts, some easy, systematic, way to >> split my calls into much smaller calls or something else? >> >> Thanks, >> >> Elfar >
