Re: [mart-dev] large downloads possible?

Syed Haider Sat, 30 Jan 2010 07:59:06 -0800

Hi Elfar,

the following may be need writing few more lines of code but will workwith your existing workflows. What you may consider doing is to firstretrieve all the gene ids or transcript ids depending upon whichsequence type you are interested in. You can do this either from webinterface or your script. Once you have these, split them into smallergroups, say 1000 each, and then send multiple queries with these ids asfilter values.


Hope this will do the trick.

Best,
Syed


Elfar Torarinsson wrote:

Hi Sayed,

thanks for your answer. I have couple of issues with that solution.
First of all I have often experienced that this feature fails, that is
I never receive the mail, especially while requesting large amount of
data. The other thing is that I wanted to be able to do this
automatically, in a cronjob for example, and although I assume this is
possible, it will require somewhat more scripting than I was planning
on doing for this (unless there is some smart option here I'm
overlooking).

Best,

Elfar


On Sat, Jan 30, 2010 at 3:47 PM, Syed Haider <[email protected]> wrote:

Hi Elfar,

the best is to download them using web browser's Export (email option). This
will compile the results on server side and then send you a link in email.

Best,
Syed


Elfar Torarinsson wrote:

Hi,

I was trying to automate regular downloads of human CDS (and UTRs)
using biomart. I have tried it using the perl script generated at
biomart:

use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile =
"/home/projects/ensembl/biomart-perl/conf/apiExampleRegistry.xml";
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,
'action'=>$action);
my $registry = $initializer->getRegistry;

my $query =
BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

$query->setDataset("hsapiens_gene_ensembl");
$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("coding");
$query->addAttribute("external_gene_id");

$query->formatter("FASTA");

my $query_runner = BioMart::QueryRunner->new();
# to obtain unique rows only
$query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();

This only retrieves a few sequences and then starts returning
"Problems with the web server: 500 read timeout"

I have also tried posting the XML using LWP in perl, this downloads
more sequences but this also stops after a while before downloading
all the sequences:

use strict;
use LWP::UserAgent;
open (FH,$ARGV[0]) || die ("\nUsage: perl postXML.pl Query.xml\n\n");
my $xml;
while (<FH>){
   $xml .= $_;
}
close(FH);

my $path="http://www.biomart.org/biomart/martservice?";;
my $request =
HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
my $ua = LWP::UserAgent->new;
$ua->timeout(30000000);
my $response;

$ua->request($request,
            sub{
                my($data, $response) = @_;
                if ($response->is_success) {
                    print "$data";
                }
                else {
                    warn ("Problems with the web server:
".$response->status_line);
                }
            },500);

I have managed to download all the sequences using the browser before,
but, it required several tries and I had to get them gzipped (also so
I could be sure I got all of them when gunzipping them).

So, my question is, is there anything I can do to be able to download
all the sequences? I.e. avoid timeouts, some easy, systematic, way to
split my calls into much smaller calls or something else?

Thanks,

Elfar

Re: [mart-dev] large downloads possible?

Reply via email to