[mart-dev] Multiple datasets join

Chan, Denny (NIH/NCI) [C] Thu, 15 May 2008 09:12:08 -0700

I have problem joining two large datasets.  

When there is no filter, the results came back in reasonable time.


When I specified a filter in the first dataset to limit the result,  it
took a long time to execute.

The log file indicated that the Biomart-perl got the first 200 records
from the second dataset and took those 200 records to filter the first
dataset in a separate SQL statement.  The Biomart-perl then took another
200 records from the second datasets and filtered with the first
dataset.  The system looped thru 200 records at a time until exhausted
all the records in the second dataset.  It took a while to loop thru
500,000 records in the second dataset.

Even though adding a filter to the first dataset gave back only couple
records, it would still take a long time to loop thru the 2nd dataset.

Is there a way to speed up the dataset join or change the way that
Biomart handling the datasets join with filter?

Thanks,
Denny

[mart-dev] Multiple datasets join

Reply via email to