I have problem joining two large datasets. When there is no filter, the results came back in reasonable time.
When I specified a filter in the first dataset to limit the result, it took a long time to execute. The log file indicated that the Biomart-perl got the first 200 records from the second dataset and took those 200 records to filter the first dataset in a separate SQL statement. The Biomart-perl then took another 200 records from the second datasets and filtered with the first dataset. The system looped thru 200 records at a time until exhausted all the records in the second dataset. It took a while to loop thru 500,000 records in the second dataset. Even though adding a filter to the first dataset gave back only couple records, it would still take a long time to loop thru the 2nd dataset. Is there a way to speed up the dataset join or change the way that Biomart handling the datasets join with filter? Thanks, Denny
