[ 
https://issues.apache.org/jira/browse/CLIMATE-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743391#comment-13743391
 ] 

Cameron Goodale commented on CLIMATE-248:
-----------------------------------------

Alex,

I was also toying with that idea, trying to see if we can somehow mask the 
final output (after the average is applied), but I think we have to work with 
masked data to avoid polluting the average calculation.  Unless the underlying 
mask is empty, then we can use the code you suggested.  It might be worth 
testing if any values are masked, and if not we can use numpy.average instead.
                
> PERFORMANCE - Rebinning Daily to Monthly datasets takes a really long time
> --------------------------------------------------------------------------
>
>                 Key: CLIMATE-248
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-248
>             Project: Apache Open Climate Workbench
>          Issue Type: Improvement
>          Components: regridding
>    Affects Versions: 0.1-incubating, 0.2-incubating
>         Environment: *nix
>            Reporter: Cameron Goodale
>            Assignee: Cameron Goodale
>              Labels: performance
>             Fix For: 0.3-incubating
>
>         Attachments: inital_profile.txt, test.py
>
>
> When I was testing the dataset_processor module I noticed that most tests 
> would complete in less than 1 second.  Then I came across the 
> "test_daily_to_monthly_rebin" test and it would take over 2 minutes to 
> complete.
> The test initially used a 1x1 degree grid covering the globe and daily time 
> step for 2 years (730 days).
> I ran some initial checks and the lag appears to be down in the code where 
> the data is rebinned down in '_rcmes_calc_average_on_new_time_unit_K'.
> {code}
>                 mask = np.zeros_like(data)
>                 mask[timeunits!=myunit,:,:] = 1.0
>                 # Calculate missing data mask within each time unit...
>                 datamask_at_this_timeunit = np.zeros_like(data)
>                 datamask_at_this_timeunit[:]= 
> process.create_mask_using_threshold(data[timeunits==myunit,:,:],threshold=0.75)
>                 # Store results for masking later
>                 datamask_store.append(datamask_at_this_timeunit[0])
>                 # Calculate means for each pixel in this time unit, ignoring 
> missing data (using masked array).
>                 datam = 
> ma.masked_array(data,np.logical_or(mask,datamask_at_this_timeunit))
>                 meanstore[i,:,:] = ma.average(datam,axis=0)
> {code}
> That block is suspect since the rest of the code is doing simple string 
> parsing and appending to lists.  I don't have the time to do a deep dive into 
> this now, and it technically isn't broken, but just really slow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to