Re: [mart-dev] large downloads possible?

Elfar Torarinsson Sat, 30 Jan 2010 08:45:12 -0800

Hi Syed,

sounds like a very doable solution, I'll try that.


Appreciate your suggestions :)

Best,

-elfar

On Sat, Jan 30, 2010 at 4:52 PM, Syed Haider <[email protected]> wrote:
> Hi Elfar,
>
> the following may be need writing few more lines of code but will work with
> your existing workflows. What you may consider doing is to first retrieve
> all the gene ids or transcript ids depending upon which sequence type you
> are interested in. You can do this either from web interface or your script.
> Once you have these, split them into smaller groups, say 1000 each, and then
> send multiple queries with these ids as filter values.
>
> Hope this will do the trick.
>
> Best,
> Syed
>
>
> Elfar Torarinsson wrote:
>>
>> Hi Sayed,
>>
>> thanks for your answer. I have couple of issues with that solution.
>> First of all I have often experienced that this feature fails, that is
>> I never receive the mail, especially while requesting large amount of
>> data. The other thing is that I wanted to be able to do this
>> automatically, in a cronjob for example, and although I assume this is
>> possible, it will require somewhat more scripting than I was planning
>> on doing for this (unless there is some smart option here I'm
>> overlooking).
>>
>> Best,
>>
>> Elfar
>>
>>
>> On Sat, Jan 30, 2010 at 3:47 PM, Syed Haider <[email protected]>
>> wrote:
>>>
>>> Hi Elfar,
>>>
>>> the best is to download them using web browser's Export (email option).
>>> This
>>> will compile the results on server side and then send you a link in
>>> email.
>>>
>>> Best,
>>> Syed
>>>
>>>
>>> Elfar Torarinsson wrote:
>>>>
>>>> Hi,
>>>>
>>>> I was trying to automate regular downloads of human CDS (and UTRs)
>>>> using biomart. I have tried it using the perl script generated at
>>>> biomart:
>>>>
>>>> use strict;
>>>> use BioMart::Initializer;
>>>> use BioMart::Query;
>>>> use BioMart::QueryRunner;
>>>>
>>>> my $confFile =
>>>> "/home/projects/ensembl/biomart-perl/conf/apiExampleRegistry.xml";
>>>> my $action='cached';
>>>> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,
>>>> 'action'=>$action);
>>>> my $registry = $initializer->getRegistry;
>>>>
>>>> my $query =
>>>>
>>>> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
>>>>
>>>> $query->setDataset("hsapiens_gene_ensembl");
>>>> $query->addAttribute("ensembl_gene_id");
>>>> $query->addAttribute("ensembl_transcript_id");
>>>> $query->addAttribute("coding");
>>>> $query->addAttribute("external_gene_id");
>>>>
>>>> $query->formatter("FASTA");
>>>>
>>>> my $query_runner = BioMart::QueryRunner->new();
>>>> # to obtain unique rows only
>>>> $query_runner->uniqueRowsOnly(1);
>>>>
>>>> $query_runner->execute($query);
>>>> $query_runner->printHeader();
>>>> $query_runner->printResults();
>>>> $query_runner->printFooter();
>>>>
>>>> This only retrieves a few sequences and then starts returning
>>>> "Problems with the web server: 500 read timeout"
>>>>
>>>> I have also tried posting the XML using LWP in perl, this downloads
>>>> more sequences but this also stops after a while before downloading
>>>> all the sequences:
>>>>
>>>> use strict;
>>>> use LWP::UserAgent;
>>>> open (FH,$ARGV[0]) || die ("\nUsage: perl postXML.pl Query.xml\n\n");
>>>> my $xml;
>>>> while (<FH>){
>>>>   $xml .= $_;
>>>> }
>>>> close(FH);
>>>>
>>>> my $path="http://www.biomart.org/biomart/martservice?";;
>>>> my $request =
>>>>
>>>> HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
>>>> my $ua = LWP::UserAgent->new;
>>>> $ua->timeout(30000000);
>>>> my $response;
>>>>
>>>> $ua->request($request,
>>>>            sub{
>>>>                my($data, $response) = @_;
>>>>                if ($response->is_success) {
>>>>                    print "$data";
>>>>                }
>>>>                else {
>>>>                    warn ("Problems with the web server:
>>>> ".$response->status_line);
>>>>                }
>>>>            },500);
>>>>
>>>> I have managed to download all the sequences using the browser before,
>>>> but, it required several tries and I had to get them gzipped (also so
>>>> I could be sure I got all of them when gunzipping them).
>>>>
>>>> So, my question is, is there anything I can do to be able to download
>>>> all the sequences? I.e. avoid timeouts, some easy, systematic, way to
>>>> split my calls into much smaller calls or something else?
>>>>
>>>> Thanks,
>>>>
>>>> Elfar
>

Re: [mart-dev] large downloads possible?

Reply via email to