Hi Denny Just the flip the dataset order and it would filter the second dataset first.
DS_1 DS_2 (with filter) this should quickify the process :) cheers syed On Thu, 2008-05-15 at 11:57 -0400, Chan, Denny (NIH/NCI) [C] wrote: > I have problem joining two large datasets. > > When there is no filter, the results came back in reasonable time. > > When I specified a filter in the first dataset to limit the result, it > took a long time to execute. > > The log file indicated that the Biomart-perl got the first 200 records > from the second dataset and took those 200 records to filter the first > dataset in a separate SQL statement. The Biomart-perl then took another > 200 records from the second datasets and filtered with the first > dataset. The system looped thru 200 records at a time until exhausted > all the records in the second dataset. It took a while to loop thru > 500,000 records in the second dataset. > > Even though adding a filter to the first dataset gave back only couple > records, it would still take a long time to loop thru the 2nd dataset. > > Is there a way to speed up the dataset join or change the way that > Biomart handling the datasets join with filter? > > Thanks, > Denny -- ====================================== Syed Haider. EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. ======================================
