Hi Syed,

More comments in line...

On Wed, Jul 14, 2010 at 5:04 PM, Leandro Hermida
<[email protected]> wrote:
> On Wed, Jul 14, 2010 at 5:02 PM, Syed Haider <[email protected]> wrote:
>>
>>
>> On 14/07/2010 15:46, Leandro Hermida wrote:
>>>
>>> Hi again,
>>>
>>> In the new BioMart 0.8 will the SOAP and REST APIs have:
>>> - an option to return results in JSON or other serialized data structure
>>> form?
>>
>> tentative yes for results request. For all other API call (meta data calls),
>> a definite yes. For the former, there is very little point to e.g wrap 1000
>> bytes of gene ids in  20,000 bytes of JSON.
>>
>
> good point, but many times you are returning much more than that,
> records with many attributes

just to give you and example here, a complementary software to BioMart
I am using in one of my projects is Solr, a full-text search engine
and framework.  In some fundamental ways both BioMart and Solr have
some shared goals, they both have a web-based query interface

With Solr you run a REST query like this:

http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=xml

<result name="response" numFound="390256" start="0">
    <doc>
      <str name="id">Q96GW9</str>
      <str name="organism">Homo sapiens (Human)</str>
      ...
    </doc>
    <doc>
      <str name="id">Q499X9</str>
      <str name="organism">Mus musculus (Mouse)</str>
    </doc>
    ...
</result>

or to return JSON...
http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=json

"response":{
  "numFound":390256,
  "start":0,
  "docs":[
        {
         "id":"Q96GW9",
         "organism":"Homo sapiens (Human)",
         ...
        },
        {
         "id":"Q499X9",
         "organism":"Mus musculus (Mouse)",
         ...
        },
        ...
  ]
}

You can specify the return type and also choose how many docs to skip
and how many docs to return (LIMIT x,y).  This is the way I would
recommend for BioMart, don't you think?

best,
Leandro

>
>>> - an option to return results sorted by some attribute(s)?
>>
>> no, thats a post processing option and tends to be very expensive as it
>> needs all results to be collected in the first place. we can make it
>> optional though. BioMart web interface would have this option for sure.
>>
>
> why not let the database do these things? (i.e. ... ORDER BY x1 ASC,y1
> DESC, z1 ASC ) I noticed that also in the current 0.7 you do many
> things post-processed in Perl, e.g. unique rows are processed in Perl
> after returning database results, why not use just use SELECT DISTINCT
> ....?
>
>>> - an option to return results with LIMITs in full form i.e. start_row,
>>> end_row (for paging)?
>>
>> you will have limit as offset of zero. e.g you can retrieve, first 100,
>> first 1000, first 10000 and so on.
>
> again why not let the database do it? ( e.g. ... LIMIT 100,500 )
>
>>
>> HTH,
>> Syed
>>
>>>
>>> best,
>>> Leandro
>>>
>>> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
>>> <[email protected]>  wrote:
>>>>
>>>> Hi Syed,
>>>>
>>>> Since none of the BioMart APIs actually return results in a data
>>>> structure (it only returns formatted files like TSV, etc) I was trying
>>>> to be helpful and show other developers on this forum how they can go
>>>> about populating a Perl data structure from the results returned by
>>>> BioMart.
>>>>
>>>> It's not obvious after reading the docs and when you get started how
>>>> you need to do this, one initially expects in the APIs that there
>>>> would be for e.g. in the Perl API some method call ->getResults()
>>>> which returns an @array of arrayrefs structure or in the REST API that
>>>> there would be an option to return for e.g. a JSON serialized data
>>>> structure that can be unserialized into a native data structure for
>>>> the language you are using.
>>>>
>>>> best,
>>>> Leandro
>>>>
>>>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[email protected]>
>>>>  wrote:
>>>>>
>>>>> Hi Leandro,
>>>>>
>>>>> this is the only method that returns the results. What exactly are you
>>>>> after
>>>>> ?
>>>>>
>>>>> Best
>>>>> Syed
>>>>>
>>>>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>>>>
>>>>>> Sorry forgot to post what I did before! For those of your who use the
>>>>>> Biomart APIs and want to get results back into a Perl data structures,
>>>>>> here is the approach I use:
>>>>>>
>>>>>> If using the Perl API:
>>>>>>
>>>>>> use BioMart::Initializer;
>>>>>> use BioMart::Query;
>>>>>> use BioMart::QueryRunner;
>>>>>>
>>>>>> my $bm_initializer = BioMart::Initializer->new(
>>>>>>     registryFile =>    "/path/to/myRegistry.xml",
>>>>>>     action =>    'update',
>>>>>> );
>>>>>> my $bm_query = BioMart::Query->new(
>>>>>>     registry =>    $bm_initializer->getRegistry(),
>>>>>>     virtualSchemaName =>    'default'
>>>>>> );
>>>>>> $bm_query->setDataset('my_dataset');
>>>>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>>>>> $bm_query->addAttribute('attr2');
>>>>>> $bm_query->addAttribute('attr3');
>>>>>> $bm_query->formatter('TSV');
>>>>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>>>>> $bm_query_runner->uniqueRowsOnly(1);
>>>>>> $bm_query_runner->execute($bm_query);
>>>>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>>>>> $bm_query_runner->printResults(\*RESULTS);
>>>>>> seek(RESULTS, 0, 0);
>>>>>> while (<RESULTS>) {
>>>>>>     chomp;
>>>>>>     my @row_fields = split /\t/;
>>>>>>     # build up a data structure or processed your fields here...
>>>>>> }
>>>>>> close(RESULTS);
>>>>>>
>>>>>>
>>>>>> Using the REST API:
>>>>>>
>>>>>> use LWP::UserAgent ();
>>>>>>
>>>>>> my $query_xml =<<XML;
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <!DOCTYPE Query>
>>>>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>>>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>>>>     <Dataset name="my_dataset" interface="default">
>>>>>>         <Filter name="attr1" value="Q6LTE1"/>
>>>>>>         <Attribute name="attr2" />
>>>>>>         <Attribute name="attr3" />
>>>>>>     </Dataset>
>>>>>> </Query>
>>>>>> XML
>>>>>>
>>>>>> my $ua = LWP::UserAgent->new();
>>>>>> my $response =
>>>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>>>>> [ query =>    $query_xml ]);
>>>>>> if ($response->is_success and $response->decoded_content !~
>>>>>> /BioMart::Exception/i) {
>>>>>>     open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>>>>     while (<RESULTS>) {
>>>>>>         chomp;
>>>>>>         my @row_fields = split /\t/;
>>>>>>         # build up a data structure or processed your fields here...
>>>>>>     }
>>>>>>     close(RESULTS);
>>>>>> }
>>>>>> else {
>>>>>>     die $response->decoded_content, "\n";
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[email protected]>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi Leandro,
>>>>>>>
>>>>>>> The datastructures representation of results is not returned by the
>>>>>>> API.
>>>>>>> If
>>>>>>> you are feeling adventurous please feel free to look into the
>>>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that
>>>>>>> you
>>>>>>> are
>>>>>>> interested in.
>>>>>>>
>>>>>>>
>>>>>>> Best
>>>>>>> Syed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I was wondering if there is a way using the Perl API to get results
>>>>>>>> in a
>>>>>>>> Perl data structure and, if possible, row by row.  For example each
>>>>>>>> row
>>>>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>>>>> printResults() and have to break everything up again when I know
>>>>>>>> somewhere
>>>>>>>> in the Perl API it was doing the reverse...
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Leandro
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Reply via email to