[ 
https://issues.apache.org/jira/browse/CLIMATE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913723#comment-13913723
 ] 

Alex Goodman commented on CLIMATE-349:
--------------------------------------

Hi Mazi,

I believe you are correct, in fact for those cases an exception would be raised 
after attempting to change the shape since numpy arrays must maintain a 
constant size (unless the resize() method is used, but this would require 
reallocating a new array, which is undesirable).

However keep in mind that this is a helper function for individual metrics, it 
is generally only used for metrics which often assume that the total number of 
months in the dataset are multiples of 12. For example, if I wanted to 
calculate averages of every month, using a 25 month dataset wouldn't make much 
sense, especially since IIRC the OCW user has control over which time period 
within the data to use and subset it appropriately. It might be a problem for 
calculating seasonal climatology though if the user didn't know to do this 
beforehand, for example if you want to know the climatology for January alone, 
using the last month might be needed. A worst case scenario would probably be 
something like having a dataset that ranges from January 1989 - Jaunuary 2014. 
Now if I wanted to have the most up to date climatology for January (ie take 
every year into account), using reshapeMonthlyData() as currently written would 
not be possible.

Remember that the point of reshapeMonthlyData was to improve not only 
performance but also make the code a lot more readable. I doubt the scenario 
described above would be a common problem in practice, but it is always 
possible to revert a function in metrics to a previous revision that did not 
have the reshapeMonthlyData refactoring but may have been less strict on the 
dimensional requirements of the data. Alternatively, you could also add an if 
statement and use the resize() function to generate the returned data array 
with the new shape, though being sure to use the rounded up value for the 
number of months dimension. Something like this:

if nMonth % 12:
   nshape = np.ceil(25./12.), 12
   data = np.resize(data, tuple(list(nshape) + list(ishape[1:])))
else:
   use same reshaping code as before in this block

What this does is run the same code as before in the ideal case of having 
nMonth being a multiple of 12, otherwise resize will create a deep copy of the 
original data but with the extra space. This way monthly calculations can still 
be vectorized. Now that I think about it, this is probably the best solution to 
the problem since it doesn't require additional refactoring of metrics while 
maintaining the performance and readability benefits we had before. You could 
probably go further and make the function return a masked array to mask out the 
values in the excess parts of the array (eg data[-1, 1:] in this example and 
make a masked array, but this too has its own disadvantages (mainly 
performance).

Sorry for being tl;dr, but I hope this helps.

Alex

> Refactoring "reshapeMonthlyData" from rcmes/utils/misc.py 
> ----------------------------------------------------------
>
>                 Key: CLIMATE-349
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-349
>             Project: Apache Open Climate Workbench
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.3-incubating
>            Reporter: Maziyar Boustani
>            Assignee: Maziyar Boustani
>             Fix For: 0.4-incubating
>
>
> Refactoring function "reshapeMonthlyData" from [1] to [2] and providing 
> unittest for that.
> [1]:https://svn.apache.org/repos/asf/incubator/climate/trunk/rcmet/src/main/python/rcmes/utils/misc.py
> [2]:https://svn.apache.org/repos/asf/incubator/climate/trunk/ocw/utils.py



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to