Re: Review Request: metrics improvements

Alex Goodman Sat, 15 Jun 2013 12:20:12 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11873/
-----------------------------------------------------------


(Updated June 15, 2013, 3:38 a.m.)


Review request for Apache Open Climate, Cameron Goodale and Kyo Lee.


Changes
-------

-Fixed a typo in function name for calcAnnualCycleDomainMeans()
-Changed parameter names from 'dataset1' and 'dataset2' to 'evaluationData' and 
'referenceData'
-Changed a few out of date comments / docstrings


Description
-------

This is a major update to metrics as per CLIMATE-88. There are many changes, 
most of them related to vectorizing many of the functions using various 
indexing tricks. If you don't know this terminology, this basically means that 
operations are performed on an entire array chunk at once, eliminating the need 
for explicit for loops and improving performance and code readability in most 
cases. I'll summarize some the changes that resulted here: 

-For the most part, absolute performance increases were not that large but the 
code became significantly more concise. A few of the functions (particularly 
correlation calculation) showed very large gains. You can run the updated 
attached benchmark_metrics.py script to see for yourself (See end of this post 
for more details on that).

-Thanks to the addition of the reshapeMonthlyData() helper function in my 
previous patch to misc.py, the explicit use of datetimes as a parameter for 
many of the functions is no longer needed.

-The names of some variables were changed to adhere to our current coding 
conventions.

-Functions that were commented out have been removed, complying to our 
deprecation policy.

To run the benchmarking script, assuming you have the rcmes directory in your 
PYTHONPATH do:

python benchmark_metrics.py

This will benchmark the functions that were changed between revisions for 10 
years of randomly generated data on a 100 x 100 grid. To change the number of 
years of data generated for the test, do:

python benchmark_metrics.py nYR

Where 'nYR' is the number of years of data you wish to use for the benchmark.

Finally, you can test to see if the revised functions are consistent with their 
previous versions by running it in test mode:

python benchmark_metrics.py -t

This does not cover every possible test case, but from current testing 
everything seems to work fine. Keep in mind though that they are tested against 
revisions in the repository and not against Jinwon's upcoming revisions, so if 
a previously used function was wrong, then so is the revised one.


Diffs (updated)
-----

  
http://svn.apache.org/repos/asf/incubator/climate/trunk/rcmet/src/main/python/rcmes/toolkit/metrics.py
 1492816 

Diff: https://reviews.apache.org/r/11873/diff/


Testing
-------

-Randomly generated masked arrays via the attached benchmark_metrics.py
-Some of the TRMM data


Thanks,

Alex Goodman

Re: Review Request: metrics improvements

Reply via email to