Re: Review Request: metrics improvements

Cameron Goodale Mon, 17 Jun 2013 22:05:11 -0700


> On June 17, 2013, 6:48 p.m., Alex Goodman wrote:
> > Kyo and I went over this together and there are a few issues that need to 
> > be considered before committing this:
> > 
> > 1) Although my version of calcTemporalCorrelation is faster, it is also 
> > longer and harder to understand. It also has the disadvantage of not 
> > relying directly on scipy for the calculation so we may lose some 
> > consistency after scipy updates.
> > 2) Jinwon mentioned in an e-mail that several of these functions do not 
> > work for daily data, and this is true. It is not a huge concern now because 
> > our current workflow primarily consists of monthly data, but it is a 
> > disadvantage nonetheless. Luckily I have already written a function that 
> > does something similar to reshapeMonthlyData() but for daily data a while 
> > back, but I will still need to take the time to implement and test this.
> > 
> > Thus, until these are fully taken care of, we think that we should either 
> > define separate versions of these functions inside metrics.py (eg 
> > calcClimSeasonOptimized()) or add a new module in rcmes/toolkit called 
> > something like 'metrics2.py' which has all of my changes, while keeping the 
> > older functions in metrics.py until support for daily data is added. I 
> > understand that this would add some clutter to the existing codebase, so we 
> > would like to hear what everyone else thinks.
> > 
> > What are your thoughts?


Alex,

I don't like the idea of fragmenting the codebase around appending the number 
2.  So i vote against creating metrics2.py.

It sounds like this review has grown to be too large and has started to shift 
away from it's original intent.  Since it seems this task has grown in scope, 
it makes sense to just break it down into smaller tasks that are more 
manageable.  You can open 1 or more JIRA issues to track these additional 
changes that are not ready and still need time to engineer and test.  That 
would leave you the ability to scale this Review and JIRA issue back to a scope 
that you can commit on.

For a less abstract example:  If you have 3 functions, out of say 20 that can 
be committed, then scale the JIRA scope down to just those 3 issues and update 
the review and commit the 3 functions.  Then open more JIRA issues to track the 
changes needed to support the remaining 17 functions you want to change.  It 
will be up to you if you need to open just 1 more JIRA or 4 or 17, it is really 
up to you.

The CLIMATE-88 JIRA issue is to improve metrics performance, which is a massive 
undertaking and also ambiguous.  So my thought is to scale that issue back and 
focus it down to a patch/change you can commit now, and push the rest of the 
code out as future work.

My $0.02


- Cameron


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11873/#review22008
-----------------------------------------------------------


On June 15, 2013, 3:38 a.m., Alex Goodman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11873/
> -----------------------------------------------------------
> 
> (Updated June 15, 2013, 3:38 a.m.)
> 
> 
> Review request for Apache Open Climate, Cameron Goodale and Kyo Lee.
> 
> 
> Description
> -------
> 
> This is a major update to metrics as per CLIMATE-88. There are many changes, 
> most of them related to vectorizing many of the functions using various 
> indexing tricks. If you don't know this terminology, this basically means 
> that operations are performed on an entire array chunk at once, eliminating 
> the need for explicit for loops and improving performance and code 
> readability in most cases. I'll summarize some the changes that resulted 
> here: 
> 
> -For the most part, absolute performance increases were not that large but 
> the code became significantly more concise. A few of the functions 
> (particularly correlation calculation) showed very large gains. You can run 
> the updated attached benchmark_metrics.py script to see for yourself (See end 
> of this post for more details on that).
> 
> -Thanks to the addition of the reshapeMonthlyData() helper function in my 
> previous patch to misc.py, the explicit use of datetimes as a parameter for 
> many of the functions is no longer needed.
> 
> -The names of some variables were changed to adhere to our current coding 
> conventions.
> 
> -Functions that were commented out have been removed, complying to our 
> deprecation policy.
> 
> To run the benchmarking script, assuming you have the rcmes directory in your 
> PYTHONPATH do:
> 
> python benchmark_metrics.py
> 
> This will benchmark the functions that were changed between revisions for 10 
> years of randomly generated data on a 100 x 100 grid. To change the number of 
> years of data generated for the test, do:
> 
> python benchmark_metrics.py nYR
> 
> Where 'nYR' is the number of years of data you wish to use for the benchmark.
> 
> Finally, you can test to see if the revised functions are consistent with 
> their previous versions by running it in test mode:
> 
> python benchmark_metrics.py -t
> 
> This does not cover every possible test case, but from current testing 
> everything seems to work fine. Keep in mind though that they are tested 
> against revisions in the repository and not against Jinwon's upcoming 
> revisions, so if a previously used function was wrong, then so is the revised 
> one.
> 
> 
> Diffs
> -----
> 
>   
> http://svn.apache.org/repos/asf/incubator/climate/trunk/rcmet/src/main/python/rcmes/toolkit/metrics.py
>  1492816 
> 
> Diff: https://reviews.apache.org/r/11873/diff/
> 
> 
> Testing
> -------
> 
> -Randomly generated masked arrays via the attached benchmark_metrics.py
> -Some of the TRMM data
> 
> 
> Thanks,
> 
> Alex Goodman
> 
>

Re: Review Request: metrics improvements

Reply via email to