On 15-May-08, at 11:57 AM, Chan, Denny (NIH/NCI) [C] wrote:
I have problem joining two large datasets.
When there is no filter, the results came back in reasonable time.
When I specified a filter in the first dataset to limit the result,
it
took a long time to execute.
The log file indicated that the Biomart-perl got the first 200 records
from the second dataset and took those 200 records to filter the first
dataset in a separate SQL statement. The Biomart-perl then took
another
200 records from the second datasets and filtered with the first
dataset. The system looped thru 200 records at a time until exhausted
all the records in the second dataset. It took a while to loop thru
500,000 records in the second dataset.
Even though adding a filter to the first dataset gave back only couple
records, it would still take a long time to loop thru the 2nd dataset.
Is there a way to speed up the dataset join or change the way that
Biomart handling the datasets join with filter?
Hi Denny,
try to reverse the order of dataset join and see if this helps
a.
Thanks,
Denny