RE: [mart-dev] Joining two datasets

Syed Haider Thu, 29 May 2008 07:23:13 -0700

Denny, we need to investigate this further. If you want to place
terminate statement in your code, feel free to. Just to let you know
that data from DS2 is retrieved in batches which are then fed into DS1
as filters (IN clause). Hence, if a certain batch of DS1 doesnt return
anything from DS2, it doesnt necessarily mean to stop the execution.


syed


On Thu, 2008-05-29 at 09:44 -0400, Chan, Denny (NIH/NCI) [C] wrote:
> Hi Syed,
> 
> We provide couple datasets for users to search on.  They can pick any
> one as the first dataset, apply filters, and link with other dataset.
> 
> It is hard to control the workflow for applying filter just on the
> second dataset.
> 
> The filter we test on is just an example.  It is not a default filter.
> 
> Thanks,
> Denny
> 
> -----Original Message-----
> From: Syed Haider [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, May 29, 2008 9:31 AM
> To: Chan, Denny (NIH/NCI) [C]
> Cc: [email protected]
> Subject: RE: [mart-dev] Joining two datasets
> 
> Denny,
> 
> Is there any special reason you cant put this filter on second dataset. 
> 
> All you need to do is to set a default filter for the dataset config
> (using MartEditor). This filter will be preselected whenever a user
> selects this dataset regardless of its order. does it solve your
> problem ?
> 
> syed
> 
> 
> On Thu, 2008-05-29 at 09:20 -0400, Chan, Denny (NIH/NCI) [C] wrote:
> > Hi Syed,
> > 
> > Two weeks ago, I asked a question about joining two large datasets
> that
> > took a long time to execute.  
> > Your suggestion was to switch the order of the query.  It did improve
> > the performance.
> > 
> > However, one of our use cases requires to put a filter on the first
> > dataset. 
> > It seems that if there is filter in the first dataset, execution time
> is
> > long.
> > 
> > I am not concerning the performance at this point.  I can use the
> "email
> > notification" option to wait for the results.
> > 
> > But it will eventually give me exception error. From the log, the
> > program printed out "Got no results" after executing sql for the first
> > dataset.
> > Can I terminate the iteration at that point?
> > 
> > Thanks,
> > Denny 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Syed Haider [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, May 29, 2008 8:50 AM
> > To: Chan, Denny (NIH/NCI) [C]
> > Cc: [email protected]
> > Subject: Re: [mart-dev] Joining two datasets
> > 
> > Hi Denny,
> > 
> > can you execute the same query in reverse order - by swapping the
> > dataset order ? what happens when you do this ?
> > 
> > syed
> > 
> > 
> > 
> > 
> > On Thu, 2008-05-29 at 08:02 -0400, Chan, Denny (NIH/NCI) [C] wrote:
> > > When joining two datasets, the BioMart ran a batch iteration for
> both
> > > datasets. But when it reached to the end of first dataset, it still
> > > tried to query the second dataset with a invalid SQL statement.
> Here
> > is
> > > what in the log file
> > > 
> > > 
> > >
> >
> ========================================================================
> > > ====================================================
> > > BioMart.Dataset.TableSet:735:WARN> QUERY SQL:  SELECT main.seqid,
> > > main.charge FROM cpas2biomart.peptidesview__peptidesview
> > > __main main LIMIT 50000 OFFSET 9815643
> > > BioMart.DatasetI:1175:DEBUG> Got no results
> > > BioMart.DatasetI:1261:DEBUG> Attribute hash
> > > BioMart.DatasetI:1262:DEBUG> Before hash: 0
> > > BioMart.DatasetI:1269:DEBUG> After hash: 0
> > > BioMart.Dataset.TableSet:735:WARN> QUERY SQL:  SELECT
> main.seqid_key,
> > > main.bestname, main.length, main.mass, main.descript
> > > ion, main.seqid_key FROM
> > > cpas2biomart.protsequencesview__protsequences__main main WHERE
> > > (main.seqid_key = '96305') AND (ma
> > > in.seqid_key IN('')) LIMIT 400
> > > DBD::Pg::st execute failed: ERROR:  invalid input syntax for
> integer:
> > ""
> > > BioMart.Web:2228:DEBUG> Serious error: Error during query execution:
> > > ERROR:  invalid input syntax for integer: ""
> > >
> >
> ========================================================================
> > > ======================================================
> > > 
> > > 
> > > The first dataset "peptidesview__peptidesview"  has 9815643 records.
> > > The first SQL statement will return zero record, which leads to an
> > empty
> > > IN clause in the second SQL statement.
> > > 
> > > 
> > > Does anyone know the fix for this problem?
> > > 
> > > Thanks,
> > > Denny Chan
-- 
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================

RE: [mart-dev] Joining two datasets

Reply via email to