Hi Lars, I have been looking at the aggregation process and have a few questions.
I chose a range of months (January-December 2009) and aggregated all data from facilities to districts. Monitoring the logs, I saw that a greater number of periods were aggregated during the process for some reason, which seemed strange to me. I dug deeper and pulled out the resulting set of periods [1]. As you can see, there are a number of periods, that overlap, but which are not contained in any of the requested aggregation periods. [2] First, it would seem that DHIS2 is asking for all data that "overlaps" the desired range of time periods, but actually, it should be the time periods requested, and any time periods that the desired aggregation time periods are composed of. For instance I have a quarterly time period with periodid 20684. This query gives me a list of all periods that this period may be composed of SELECT periodid from period where startdate >= (SELECT startdate from period where periodid = 20684) and enddate <= (SELECT enddate from period where periodid = 20684) and periodid <> 20684 So, I would expect that the aggregation engine would analyze the requested time periods and data elements, and then decompose those individually via a query such as given above, to get all "dependent" time periods. This process would repeat itself until the time periods could not be decomposed any further or until it does not make sense to do so. I would expect that for the indicators/data elements chosen by the user, that only the periodicity of the data set would be used to determine the base time period which to begin aggregation. For instance, I might chose to aggregate data which has been entered with a monthly periodicity, and aggregate it to quarters. The aggregation engine should know that based on my choice of data elements (e.g. monthly) that all periods for that given indicator/data element need to be retrieved and then aggregated into the destination time period. Could you maybe comment on why this happens this way? It would seem to be wasteful, as one of the most limiting steps in terms of performance, seems to be related to input/output of the data. If we can decrease this, it should speed things up a bit. Regards, Jason [1] http://pastebin.com/XAEtCdzY [2] http://pastebin.com/uFMgcxtT -- -- Jason P. Pickering email: jason.p.picker...@gmail.com tel:+260968395190 _______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp