Hi Syed, More comments in line...
On Wed, Jul 14, 2010 at 5:04 PM, Leandro Hermida <[email protected]> wrote: > On Wed, Jul 14, 2010 at 5:02 PM, Syed Haider <[email protected]> wrote: >> >> >> On 14/07/2010 15:46, Leandro Hermida wrote: >>> >>> Hi again, >>> >>> In the new BioMart 0.8 will the SOAP and REST APIs have: >>> - an option to return results in JSON or other serialized data structure >>> form? >> >> tentative yes for results request. For all other API call (meta data calls), >> a definite yes. For the former, there is very little point to e.g wrap 1000 >> bytes of gene ids in 20,000 bytes of JSON. >> > > good point, but many times you are returning much more than that, > records with many attributes just to give you and example here, a complementary software to BioMart I am using in one of my projects is Solr, a full-text search engine and framework. In some fundamental ways both BioMart and Solr have some shared goals, they both have a web-based query interface With Solr you run a REST query like this: http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=xml <result name="response" numFound="390256" start="0"> <doc> <str name="id">Q96GW9</str> <str name="organism">Homo sapiens (Human)</str> ... </doc> <doc> <str name="id">Q499X9</str> <str name="organism">Mus musculus (Mouse)</str> </doc> ... </result> or to return JSON... http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=json "response":{ "numFound":390256, "start":0, "docs":[ { "id":"Q96GW9", "organism":"Homo sapiens (Human)", ... }, { "id":"Q499X9", "organism":"Mus musculus (Mouse)", ... }, ... ] } You can specify the return type and also choose how many docs to skip and how many docs to return (LIMIT x,y). This is the way I would recommend for BioMart, don't you think? best, Leandro > >>> - an option to return results sorted by some attribute(s)? >> >> no, thats a post processing option and tends to be very expensive as it >> needs all results to be collected in the first place. we can make it >> optional though. BioMart web interface would have this option for sure. >> > > why not let the database do these things? (i.e. ... ORDER BY x1 ASC,y1 > DESC, z1 ASC ) I noticed that also in the current 0.7 you do many > things post-processed in Perl, e.g. unique rows are processed in Perl > after returning database results, why not use just use SELECT DISTINCT > ....? > >>> - an option to return results with LIMITs in full form i.e. start_row, >>> end_row (for paging)? >> >> you will have limit as offset of zero. e.g you can retrieve, first 100, >> first 1000, first 10000 and so on. > > again why not let the database do it? ( e.g. ... LIMIT 100,500 ) > >> >> HTH, >> Syed >> >>> >>> best, >>> Leandro >>> >>> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida >>> <[email protected]> wrote: >>>> >>>> Hi Syed, >>>> >>>> Since none of the BioMart APIs actually return results in a data >>>> structure (it only returns formatted files like TSV, etc) I was trying >>>> to be helpful and show other developers on this forum how they can go >>>> about populating a Perl data structure from the results returned by >>>> BioMart. >>>> >>>> It's not obvious after reading the docs and when you get started how >>>> you need to do this, one initially expects in the APIs that there >>>> would be for e.g. in the Perl API some method call ->getResults() >>>> which returns an @array of arrayrefs structure or in the REST API that >>>> there would be an option to return for e.g. a JSON serialized data >>>> structure that can be unserialized into a native data structure for >>>> the language you are using. >>>> >>>> best, >>>> Leandro >>>> >>>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[email protected]> >>>> wrote: >>>>> >>>>> Hi Leandro, >>>>> >>>>> this is the only method that returns the results. What exactly are you >>>>> after >>>>> ? >>>>> >>>>> Best >>>>> Syed >>>>> >>>>> On 14/07/2010 13:14, Leandro Hermida wrote: >>>>>> >>>>>> Sorry forgot to post what I did before! For those of your who use the >>>>>> Biomart APIs and want to get results back into a Perl data structures, >>>>>> here is the approach I use: >>>>>> >>>>>> If using the Perl API: >>>>>> >>>>>> use BioMart::Initializer; >>>>>> use BioMart::Query; >>>>>> use BioMart::QueryRunner; >>>>>> >>>>>> my $bm_initializer = BioMart::Initializer->new( >>>>>> registryFile => "/path/to/myRegistry.xml", >>>>>> action => 'update', >>>>>> ); >>>>>> my $bm_query = BioMart::Query->new( >>>>>> registry => $bm_initializer->getRegistry(), >>>>>> virtualSchemaName => 'default' >>>>>> ); >>>>>> $bm_query->setDataset('my_dataset'); >>>>>> $bm_query->addFilter('attr1', ['Q6LTE1']); >>>>>> $bm_query->addAttribute('attr2'); >>>>>> $bm_query->addAttribute('attr3'); >>>>>> $bm_query->formatter('TSV'); >>>>>> my $bm_query_runner=BioMart::QueryRunner->new(); >>>>>> $bm_query_runner->uniqueRowsOnly(1); >>>>>> $bm_query_runner->execute($bm_query); >>>>>> open(RESULTS, '+>', \my $results) or die "$!\n"; >>>>>> $bm_query_runner->printResults(\*RESULTS); >>>>>> seek(RESULTS, 0, 0); >>>>>> while (<RESULTS>) { >>>>>> chomp; >>>>>> my @row_fields = split /\t/; >>>>>> # build up a data structure or processed your fields here... >>>>>> } >>>>>> close(RESULTS); >>>>>> >>>>>> >>>>>> Using the REST API: >>>>>> >>>>>> use LWP::UserAgent (); >>>>>> >>>>>> my $query_xml =<<XML; >>>>>> <?xml version="1.0" encoding="UTF-8"?> >>>>>> <!DOCTYPE Query> >>>>>> <Query virtualSchemaName="default" formatter="TSV" header="0" >>>>>> uniqueRows="1" count="" datasetConfigVersion="0.7"> >>>>>> <Dataset name="my_dataset" interface="default"> >>>>>> <Filter name="attr1" value="Q6LTE1"/> >>>>>> <Attribute name="attr2" /> >>>>>> <Attribute name="attr3" /> >>>>>> </Dataset> >>>>>> </Query> >>>>>> XML >>>>>> >>>>>> my $ua = LWP::UserAgent->new(); >>>>>> my $response = >>>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice', >>>>>> [ query => $query_xml ]); >>>>>> if ($response->is_success and $response->decoded_content !~ >>>>>> /BioMart::Exception/i) { >>>>>> open(RESULTS, '<', \$response->decoded_content) or die "$!\n"; >>>>>> while (<RESULTS>) { >>>>>> chomp; >>>>>> my @row_fields = split /\t/; >>>>>> # build up a data structure or processed your fields here... >>>>>> } >>>>>> close(RESULTS); >>>>>> } >>>>>> else { >>>>>> die $response->decoded_content, "\n"; >>>>>> } >>>>>> >>>>>> >>>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Hi Leandro, >>>>>>> >>>>>>> The datastructures representation of results is not returned by the >>>>>>> API. >>>>>>> If >>>>>>> you are feeling adventurous please feel free to look into the >>>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that >>>>>>> you >>>>>>> are >>>>>>> interested in. >>>>>>> >>>>>>> >>>>>>> Best >>>>>>> Syed >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 09/06/2010 17:51, Leandro Hermida wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I was wondering if there is a way using the Perl API to get results >>>>>>>> in a >>>>>>>> Perl data structure and, if possible, row by row. For example each >>>>>>>> row >>>>>>>> returned as an array or arrayref. It seems inefficient to take >>>>>>>> printResults() and have to break everything up again when I know >>>>>>>> somewhere >>>>>>>> in the Perl API it was doing the reverse... >>>>>>>> >>>>>>>> thanks, >>>>>>>> Leandro >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> >
