Re: [mart-dev] Canned queries 2 - How to modify results set before display

Syed Haider Wed, 19 Mar 2008 10:18:56 -0700

Hi Roger, 

Apologies I couldnt get back on this earlier. I was extremely busy with
release preparations and some on-going developments on our side.


For release, we do not anticipate any changes in query compiler. The
list of enhancements is mainly bug fixes, so nothing fancy except
MartBuilder's full stable version shall be released.

There wont be any serious outstanding bugs in this release except the
ones which may arise after release :)

a list is hard to compile as most of the bugs we fix as we go, and
usually commit on prior release branch as well, so the users can benefit
instantly.

If there is anything specific you have in mind, please do report.

The interface helper functions you mentioned, are of course very
important and this is how we are learning more from constructive
feedback as yours. However, the current architecture wont more than
header.tt and footer.tt. We are going to have major improvements with
our interface for 0.8 onwards and that will feature detailed control
over hacking it in a decent manner :)

For now, we can help you hacking the current version.

cheers
syed


On Wed, 2008-03-19 at 16:49 +0000, Roger Hull wrote:
> Hi Arek,
> 
> Thanks for the positive reply. It is particularly important to us that 
> you you have the policy of maintaining backward compatibility.
> 
> You say of counts: "For large datasets this frequently ends up taking 
> longer than the preview of the results for the actual query" - I don't 
> see this as a big problem, as the queries seem to run fast anyway. It is 
> more important to be able to get the answer, even if if it takes a bit 
> longer.
> 
> You answered preliminary ETA is April for BioMart 0.7, but didn't give 
> any comment about a published list of enhancements and bug fixes for 
> that release. Such a list would be really helpful - a list of known or 
> suspected bugs would save users time if they hit an already known bug, 
> as the first assumption of a user is that it is something he or she has 
> done wrong, and sometimes only after much time spent investigating will 
> it get reported to the  mart-dev list.
> 
> For enhancements, I am particularly interested in extension of your 
> queries to cover the standard types of query supported by SQL, for 
> example "GROUP BY" (in conjunction with counts) and WHERE clauses that 
> have an expression involving more than one column (attribute). These 
> constructions often provide quick answers to common questions about the 
> datasets, like "how many proteins are there of each of the following 
> types...".
> 
> I await with interest a further response from Syed.
> 
> Regards,
> Roger
> 
> Arek Kasprzyk wrote:
> >
> > On 12-Mar-08, at 3:44 PM, Roger Hull wrote:
> >
> >> Hi Syed,
> >
> > Hi Roger,
> > I'll let Syed reply to you in detail to your questions but let me put 
> > this into the context of our future developments.
> >
> >>
> >>
> >> Thanks for the advice. I think we see quite big risks in doing 
> >> changes which depend on our understanding too much of how your code 
> >> works. This could present problems in maintaining the code we have 
> >> written, or changed, in the future, when BioMart is upgraded or we 
> >> need to add new features.
> >>
> >> I would like to ask some questions about BioMart upgrades:
> >>
> >> (A) Will you maintain backward compatibility between a BioMart 0.6 
> >> installation which gets its data from a remote BioMart MartService, 
> >> when the remote BioMart is upgraded to 0.7,..., 1.0, etc ?
> >
> > We do currently maintain the compatibility between our 0.6 service and 
> > all other marts which run earlier versions. The central sever is a 
> > 'translation point' which maintains backward compatibility.
> > Not all of the BioMart servers accessible from our central server are 
> > 0.6 but you can query them in uniform fashion. It is our intention to 
> > maintain this compatibility in a similar way in the future.
> >
> >>
> >> (B) Do you have a date for BioMart 0.7, and have you published a 
> >> there a list of enhancements and bug fixes for that release?
> >
> > the preliminary ETA is april
> >
> >>
> >>
> >> (C) There seem to be a number of BioMart installations where people 
> >> have modified your code. As I said, I'm reluctant to follow this 
> >> path, but I wonder if you could consider adding functions, hooks, or 
> >> similar mechanisms in your code so that the BioMart behaviour can be 
> >> changed in various ways without modifying your code? (The files 
> >> header.tt and footer.tt are useful, but it is limited what can be 
> >> done by adding code to these files.)
> >
> > yes, the web code will undergo a major re-organization for 0.8 which 
> > will coincide with the release of the new configuration system. One of 
> > the main goals of this 're-organization' is to provide a flexible 
> > framework by
> > which people can extend the code both in terms of web GUI but also 
> > things like visualization etc. This is still at a very early stage so 
> > any suggestions are very welcome.
> >
> >>
> >>  A couple of suggestions, based on what I have been wanting to do:
> >>     (C1) I would like to call your AJAX functions for my own 
> >> purposes. But your function doAjaxMagic(toDo) only supports the two 
> >> values of toDo = 'countByAjax' and 'resultsByAjax'. Could you provide 
> >> a general purpose function which anyone could use? - with an argument 
> >> to specify the URL which will handle the request and another to 
> >> supply a function to handle the results (preferably supporting POST 
> >> as well as GET). Then use this function internally to implement 
> >> doAjaxMagic.
> >> If I modify your current function, or make a modified copy, then I 
> >> have to maintain this code in the future if you change the function 
> >> internally (e.g. to support new browser versions).
> >>     (C2) Could you implement, and document, a callback function in 
> >> perl, which has as input arguments the results of a query in a 
> >> parsable format (maybe XML), so that by default this function returns 
> >> the input results unchanged. Then if the user wants to filter or 
> >> otherwise modify the result data, this can be done by adding code to 
> >> this function, and modifying the data before returning it to the 
> >> caller. [OK, it's not quite as simple as this, because you batch the 
> >> results data and return a certain number of result rows at a time, 
> >> and if some are filtered out, some more have to be processed to make 
> >> up the number, but I'm sure a solution can be found.]
> >> When I looked at your code to see how and where I might do this type 
> >> of result modification, so that various formats (HTML, CSV, etc) and 
> >> various output methods (display in MartView, download to a file), are 
> >> all supported, it needed quite a lot of study of your code, and might 
> >> well present difficulties to maintain. You have a perl API based upon 
> >> BioMart::QueryRunner, which you use internally in your code, but the 
> >> only way I can see to get at the results using this API is 
> >> $query_runner->printResults(), which gives already formatted results. 
> >> I want to get at the results before formatting, so I don't have to 
> >> parse data formatted in various ways, and reformat the data after 
> >> modification.
> >> Apart from filtering the results, another common requirement would be 
> >> to add links to fields when the results are to be formatted as HTML, 
> >> or to add another column of fields fetched from a local non-BioMart 
> >> database, using one of the BioMart attributes in the results as an 
> >> index to retrieve data from this database.
> >>
> > (D) As I mentioned earlier, the Count returned by BioMart is "how many 
> > rows in the main table of the dataset match your filters so far". In 
> > the case of PRIDE, this is the number of experiments. But often I need 
> > to know the number of rows returned from the query, not the number of 
> > rows from the main table - for example I want to know the number of 
> > distinct proteins or peptides which match my filters. Will you support 
> > a more usual "number of rows returned" count in the future?
> >
> > There are two problems here: one the counts for multiple mains, the 
> > second the count of the actual number of rows returned. As far as the 
> > mains are concerned (which I think would solve your protein problem 
> > for PRIDE) we do intend to provide a proper count for each main table 
> > ei, your query has selected this many experiments, proteins etc ... 
> > how many mains you happen to have there as oppose to what we support 
> > now which is a top main. As far as the number of rows is concerned the 
> > implementation is really trivial for single datasets and the only 
> > concern there was the unpredictable performance. For large datasets 
> > this frequently ends up taking longer than the preview of the resutls 
> > for the actual query so it is not always works nice in an interactive 
> > environment. A bigger challenge is to provide a row count for 
> > federated queries. We are currently thinking of all possible scenarios 
> > for all of them
> >
> >
> > a.
> >
> >
> >>
> >>
> >> Regards,
> >> Roger.
> >>
> >
-- 
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================

Re: [mart-dev] Canned queries 2 - How to modify results set before display

Reply via email to