Ibrahim Jarif created CLIMATE-791:
-------------------------------------
Summary: temporal_subset does not work for month_start > month_end
Key: CLIMATE-791
URL: https://issues.apache.org/jira/browse/CLIMATE-791
Project: Apache Open Climate Workbench
Issue Type: Bug
Reporter: Ibrahim Jarif
Assignee: Ibrahim Jarif
Priority: Minor
The following function does not work as expected when the *month_start* is
*greater* than *month_end*. The problem occurs with the last year in the
*target_dataset*.
{code}
def temporal_subset(month_start, month_end, target_dataset,
average_each_year=False):
""" Temporally subset data given month_index.
:param month_start: An integer for beginning month (Jan=1)
:type month_start: :class:`int`
:param month_end: An integer for ending month (Jan=1)
:type month_end: :class:`int`
:param target_dataset: Dataset object that needs temporal subsetting
:type target_dataset: Open Climate Workbench Dataset Object
:param average_each_year: If True, output dataset is averaged for each year
:type average_each_year: :class:'boolean'
:returns: A temporal subset OCW Dataset
:rtype: Open Climate Workbench Dataset Object
"""
if month_start > month_end:
month_index = range(month_start,13)
month_index.extend(range(1, month_end+1))
else:
month_index = range(month_start, month_end+1)
dates = target_dataset.times
months = np.array([d.month for d in dates])
time_index = []
for m_value in month_index:
time_index = np.append(time_index, np.where(months == m_value)[0])
if m_value == month_index[0]:
time_index_first = np.min(np.where(months == m_value)[0])
if m_value == month_index[-1]:
time_index_last = np.max(np.where(months == m_value)[0])
time_index = np.sort(time_index)
time_index = time_index[np.where((time_index >= time_index_first) &
(time_index <= time_index_last))]
time_index = list(time_index)
new_dataset = ds.Dataset(target_dataset.lats,
target_dataset.lons,
target_dataset.times[time_index],
target_dataset.values[time_index,:],
variable=target_dataset.variable,
units=target_dataset.units,
name=target_dataset.name)
if average_each_year:
nmonth = len(month_index)
ntime = new_dataset.times.size
nyear = ntime/nmonth
averaged_time = []
ny, nx = target_dataset.values.shape[1:]
averaged_values =ma.zeros([nyear, ny, nx])
for iyear in np.arange(nyear):
# centered time index of the season between month_start and
month_end in each year
center_index = int(nmonth/2)+iyear*nmonth
if nmonth == 1:
center_index = iyear
averaged_time.append(new_dataset.times[center_index])
averaged_values[iyear,:] =
ma.average(new_dataset.values[nmonth*iyear:nmonth*iyear+nmonth,:], axis=0)
new_dataset = ds.Dataset(target_dataset.lats,
target_dataset.lons,
np.array(averaged_time),
averaged_values,
variable=target_dataset.variable,
units=target_dataset.units,
name=target_dataset.name)
return new_dataset
{code}
Example:
{code}
month_start = 8
month_end = 1
time = [2000,1,1,0,0,
2000,2,1,0,0,
2000,3,1,0,0,
2000,4,1,0,0,
2000,5,1,0,0,
2000,6,1,0,0,
2000,7,1,0,0,
2000,8,1,0,0,
2000,9,1,0,0,
2000,10,1,0,0,
2000,11,1,0,0,
2000,12,1,0,0]
{code}
The returned dataset should have
{code}
time = [2000,1,1,0,0,
2000,8,1,0,0,
2000,9,1,0,0,
2000,10,1,0,0
2000,11,1,0,0
2000,12,1,0,0]
{code} But it has {code} time = [2000,1,1,0,0] {code} All the other time values
are removed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)