[
https://issues.apache.org/jira/browse/CLIMATE-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913723#comment-13913723
]
Alex Goodman commented on CLIMATE-349:
--------------------------------------
Hi Mazi,
I believe you are correct, in fact for those cases an exception would be raised
after attempting to change the shape since numpy arrays must maintain a
constant size (unless the resize() method is used, but this would require
reallocating a new array, which is undesirable).
However keep in mind that this is a helper function for individual metrics, it
is generally only used for metrics which often assume that the total number of
months in the dataset are multiples of 12. For example, if I wanted to
calculate averages of every month, using a 25 month dataset wouldn't make much
sense, especially since IIRC the OCW user has control over which time period
within the data to use and subset it appropriately. It might be a problem for
calculating seasonal climatology though if the user didn't know to do this
beforehand, for example if you want to know the climatology for January alone,
using the last month might be needed. A worst case scenario would probably be
something like having a dataset that ranges from January 1989 - Jaunuary 2014.
Now if I wanted to have the most up to date climatology for January (ie take
every year into account), using reshapeMonthlyData() as currently written would
not be possible.
Remember that the point of reshapeMonthlyData was to improve not only
performance but also make the code a lot more readable. I doubt the scenario
described above would be a common problem in practice, but it is always
possible to revert a function in metrics to a previous revision that did not
have the reshapeMonthlyData refactoring but may have been less strict on the
dimensional requirements of the data. Alternatively, you could also add an if
statement and use the resize() function to generate the returned data array
with the new shape, though being sure to use the rounded up value for the
number of months dimension. Something like this:
if nMonth % 12:
nshape = np.ceil(25./12.), 12
data = np.resize(data, tuple(list(nshape) + list(ishape[1:])))
else:
use same reshaping code as before in this block
What this does is run the same code as before in the ideal case of having
nMonth being a multiple of 12, otherwise resize will create a deep copy of the
original data but with the extra space. This way monthly calculations can still
be vectorized. Now that I think about it, this is probably the best solution to
the problem since it doesn't require additional refactoring of metrics while
maintaining the performance and readability benefits we had before. You could
probably go further and make the function return a masked array to mask out the
values in the excess parts of the array (eg data[-1, 1:] in this example and
make a masked array, but this too has its own disadvantages (mainly
performance).
Sorry for being tl;dr, but I hope this helps.
Alex
> Refactoring "reshapeMonthlyData" from rcmes/utils/misc.py
> ----------------------------------------------------------
>
> Key: CLIMATE-349
> URL: https://issues.apache.org/jira/browse/CLIMATE-349
> Project: Apache Open Climate Workbench
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 0.3-incubating
> Reporter: Maziyar Boustani
> Assignee: Maziyar Boustani
> Fix For: 0.4-incubating
>
>
> Refactoring function "reshapeMonthlyData" from [1] to [2] and providing
> unittest for that.
> [1]:https://svn.apache.org/repos/asf/incubator/climate/trunk/rcmet/src/main/python/rcmes/utils/misc.py
> [2]:https://svn.apache.org/repos/asf/incubator/climate/trunk/ocw/utils.py
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)