Cameron Goodale created CLIMATE-248:
---------------------------------------
Summary: PERFORMANCE - Rebinning Daily to Monthly datasets takes a
really long time
Key: CLIMATE-248
URL: https://issues.apache.org/jira/browse/CLIMATE-248
Project: Apache Open Climate Workbench
Issue Type: Improvement
Components: regridding
Affects Versions: 0.1-incubating, 0.2-incubating
Environment: *nix
Reporter: Cameron Goodale
Assignee: Cameron Goodale
Fix For: 0.3-incubating
When I was testing the dataset_processor module I noticed that most tests would
complete in less than 1 second. Then I came across the
"test_daily_to_monthly_rebin" test and it would take over 2 minutes to complete.
The test initially used a 1x1 degree grid covering the globe and daily time
step for 2 years (730 days).
I ran some initial checks and the lag appears to be down in the code where the
data is rebinned down in '_rcmes_calc_average_on_new_time_unit_K'.
{code}
mask = np.zeros_like(data)
mask[timeunits!=myunit,:,:] = 1.0
# Calculate missing data mask within each time unit...
datamask_at_this_timeunit = np.zeros_like(data)
datamask_at_this_timeunit[:]=
process.create_mask_using_threshold(data[timeunits==myunit,:,:],threshold=0.75)
# Store results for masking later
datamask_store.append(datamask_at_this_timeunit[0])
# Calculate means for each pixel in this time unit, ignoring
missing data (using masked array).
datam =
ma.masked_array(data,np.logical_or(mask,datamask_at_this_timeunit))
meanstore[i,:,:] = ma.average(datam,axis=0)
{code}
That block is suspect since the rest of the code is doing simple string parsing
and appending to lists. I don't have the time to do a deep dive into this now,
and it technically isn't broken, but just really slow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira