On 23 May 2010 03:32, Bob Jolliffe <bobjolli...@gmail.com> wrote: > On 22 May 2010 19:51, Ola Hodne Titlestad <olati...@gmail.com> wrote: > > On 20 May 2010 18:39, Bob Jolliffe <bobjolli...@gmail.com> wrote: > >> > >> On 20 May 2010 15:56, Bob Jolliffe <bobjolli...@gmail.com> wrote: > >> > 2010/5/20 Ola Hodne Titlestad <olati...@gmail.com>: > >> >> > >> >> 2010/5/20 Lars Helge Øverland <larshe...@gmail.com> > >> >>> > >> >>> Data elements derive their period type from the data sets they are > >> >>> members > >> >>> of. > >> > > >> > Restated (what I just sent Lars only by mistake): a datavalue derives > >> > its period type from the data set of > >> > which its data element is a member :-) > >> > > >> >> > >> >> And when they are members of two datasets with different period types > >> >> they > >> >> have multiple period types right? > >> > > >> > It's important to remain aware that it is values ultimately which have > >> > periods (and hence period types). > >> > > >> > And when you look at a value you can derive its period type in one of > >> > two ways - via dataset or via period. Potentially these could > >> > disagree, The one which derives from its period should be considered > >> > authoritative ie. if the period is 2009-Jan then regardless of what > >> > the dataset might say this really must be monthly. Of course we hope > >> > these always agree. Incidentally the lookup from > >> > datelement-to-dataset-to-period looks like a greater complexity than > >> > the lookup from period->periodType. > >> > > >> >> > >> >> The key thing to look out for in data entry and data import is to > avoid > >> >> overlaps in data values that will cause duplication when aggregating > >> >> data > >> >> periods. > >> >> E.g. if the SAME ORGUNIT registers values for the same data element > for > >> >> two > >> >> different period types that have overlapping periods, e.g. Jan-10 and > >> >> Q1-10. > >> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all > >> >> show > >> >> an incorrect value since the value for Jan-10 is counted twice. > >> > > >> > OK. Thats a good concrete constraint to have. > >> > > >> >> > >> >> One way to enforce this constraint is to monitor which datasets an > >> >> orgunit > >> >> is assigned to, and not allow orgunits to be assigned to two datasets > >> >> that > >> >> have the same data element AND different period types. > >> > > >> > Agreed, Though this constraint should probably be imposed on forms > >> > rather than datasets. > >> > > >> >>As far as I am aware, > >> >> we are not checking for this today. During data import it could be > >> >> checked > >> >> on data element level by looking up the period type the way Bob has > >> >> shown, > >> >> but that sounds like a lot of look ups and time consuming validation, > >> >> or? > >> > > >> > On data import we don't really validate at all, beyond whatever > >> > constraints the db imposes. For efficiency we simply pop the values in > >> > with multiple insert statement. So this validation would have to > >> > happen as a stage before the actual import or would have to be > >> > constrained within the db. In fact it can't be validated easily > >> > before the import as it is dependent on existing values within the db. > >> > > >> >> > >> >> A relatively normal use case that we probably have to find a way to > >> >> support, > >> >> and I think they are struggling with in Vietnam, is that different > >> >> provinces > >> >> can use different period types for the same data elements (even for > >> >> complete > >> >> data sets). E.g. if the national data flow policy says to report on > >> >> immunisation data every quarter, so that becomes the minimum > >> >> requirement for > >> >> all provinces. Then some of the provinces decide that all their > >> >> facilities > >> >> have to collect this data monthly anyway, and then at the province > >> >> level > >> >> they simply send the quarterly aggregates to national level (in the > >> >> paper-based or Excel world). At the same time other provinces just > >> >> collect > >> >> quarterly data at the facility level as in the minimum national > >> >> requirement. > >> >> At the national level there is a need to consolidate all this data, > >> >> even > >> >> data by the facility level, so ideally a national DHIS database > should > >> >> be > >> >> able to store both monthly and quarterly raw data values for the same > >> >> data > >> >> elements, but for different orgunits. The national information users > >> >> can > >> >> then easily generate quarterly reports on immunisation for all > >> >> provinces, > >> >> while in some provinces they can do monthly data analysis if they > want > >> >> to > >> >> collect data using that frequency. > >> >> > >> >> We support the above scenario by allowing the same data elements to > be > >> >> assigned to different data sets with different period types, but we > >> >> don't > >> >> control for misuse of this flexibility which can lead to duplication > >> >> and > >> >> inconsistent aggregated data values as pointed out above. > >> > > >> > Thinking further ... I really think the problem arises because we we > >> > have a dataset concept which represents a form and is also used to > >> > constrain periodtypes on dataelements. Thinking of the use case you > >> > have just described, it should be the case that one can have a paper > >> > form which national level expect to collect quarterly, and the same > >> > form be used at a lower level to collect data monthly. If we wanted > >> > to mirror that use case electronically we would have to divorce the > >> > form from the periodtype - ie a form would collect datavalues of a > >> > certain period, but the same form could be used in different orgunits > >> > for collecting data at a different frequency.. > >> > > >> > So (leaving dataset aside for the moment) if we can't assign a > >> > periodtype to a form and we can't assign to a dataelement and its too > >> > inefficient to validate on a one by one datavalue basis what is a girl > >> > to do? > >> > > >> > I suspect the correct answer is to refactor datavalue and create a > >> > datavalueset type - note: a set of datavalues rather than a set of > >> > dataelements. Designing out loud, a datavalueset would have the > >> > following fields/attributes: > >> > > >> > 1. a formid - the collection instrument used - roughly corresponds to > >> > current dataset > >> > 2. an orgunitid - where the datavalues come from > >> > 3. a periodid - the period of all the datavalues > >> > couple of other useful attributes I can think of > >> > > >> > Datavalue now becomes slightly simpler (which is always a good thing). > >> > It only has: > >> > value, dataelementid, categorycombooption, datasetid > >> > >> Afterthought: > >> At the risk of adding complexity to what is otherwise a > >> simplification, my life could become even simpler if datavalueset also > >> had a categorycombo attribute, which would imply that a dataset was > >> linked to a formsectionid rather than a formid. > >> > >> So a form has sections. sections have dataelements. And sections > >> have a datavalueset as a model - which implies a uniform categorycombo > >> within the section. > >> > >> There isn't really a need for dataelements to have a categorycombo. > >> And in lots of ways its good that they don't. Then I am reducing > >> complexity rather than adding to it :-) > >> > >> Consider one orgunit has collected malaria deaths disaggregated by > >> age. Another has collected values for the the same dataelement, but > >> not disaggregated by age. The datavalues will come from a > >> datavalueset so will have a categorycombo. It is possible to > >> aggregate or compare these datavalues,from different datavaluesets, > >> but using the lowest common denominator of categorycombo ie. in both > >> cases you have access to malaria deaths - in the one case you have to > >> "roll-up" the categorycombo which does of course assume that the sum > >> of category options make a sensible whole, but Ola has mentioned this > >> one many times. > >> > > > > Some really interesting ideas you are bringing up here Bob. I like the > kind > > of flexibility and yet structure this would bring to the data model. > > > > One quick question though: > > How would this fit with the use of data elements and categorycombooptions > in > > metadata expressions like indicators and validation rules that are (and > > should be) completely independent from data collection structures? E.g. > > which categories and options should be available for a given data element > > when setting up an indicator formula? All? > > I think its a question of the "lowest common denominator" of the > datavalues that you have. Indicators are calculated from datavalues > even though we express the calculation in terms of dataelements. > > Ivalue = f(de1,de2,de3...)/g(de4, de5 ..) > > Looking just at the numerator - if the set of datavalues you have > corresponding to de1, de2 and de3 share the same categorycombo (and > note that datavalues do have a categorycombo from which their > categoryoptioncombo is derived) , then you can also produce a > similalrly disaggregated indicator value. > > If they use different categorycombos (some have age+sex, some have > hiv_age+sex, and some have just sex), but each of these have at least > the sex category, then you could produce an indicator value > disaggregated by sex. > > If the categorycombos are a jumble of apples and pears then you can > produce just the rolled up calculation. > > I like this idea.
> What is the implication? At design time, when you are coding the > expression, you probably should not include the categoryoptioncombo at > all. The indicator is just expressed in terms of dataelements (I > guess traditional DHIS14 style). But when you are generating for > example, the reporttable, the first pass analyzes the data you have > selected and suggests - would you like the indicator data > disaggregated by sex? Or age+sex? Or no disaggregation. So what you > can report on is determined by the data you've got. I think that's a > sound principle. > > I can see a few challenges with this principle. In typical implementations of DHIS you would design forms and canned/fixed reports at the same time before rolling out the installations. If it is impossible to design reports before you have any data values I can see a problem with this approach. But I guess you would know, from the forms information the potential datavaluesets and therefore could allow some disaggregated reports to be prepared even before you have any data values? Another issue I would like to bring up is performance. In the past we have struggled with and spent a lot of time on improving the performance of the datamart, the aggregation of data values. To me it sounds more complicated to have a floating set of disaggregations that needs to be looked up in a potentially huge storage of datavalues compared to working with a fixed set. Any thoughts on data mart service performance with this proposed design compared to the existing one? And I think all of this is completely independent of data collection > structures. > > Of course in practice you will have designed and deployed your > collection instruments such that all your datavalues for a given > dataelement will have the same categorycombo. But if you want to > compare data over the past five years, and the ministry decided only > in year two that they wanted to disaggregate by sex and in year 4 > decided to introduce a third sex category, then you could still > calculate an indicator from all of those datavalues - but by rolling > up sex category. > > I think what we do currently - specifying the categorycombo in the > indicator expression - is more rigid and more fragile. > > Agree, and I think most indicators analysis will be on the data element level anyway (without any disaggregations), so the current design is too complicated and cumbersome to work with. Ola ---------- > In summary, what we have with categorycombos etc is really quite > brilliant. We don't have ragged data. Our datavalues are stored > compactly and uniformly. All this is great. I think a mistake we may > have made is attaching categorycombo to the dataelement. The > relationship between a categorycombo and a dataelement can and should > be a transient thing. I believe the categorycombo should be a > characteristic of the way we collect the particular datavalues ie. a > characteristic of a particular form. There is a long conversation > before where it emerged that part of the original design rationale of > the categorycombo was indeed related to form layout. At the time this > upset me a bit, because I too had bought into the rigid edifice we had > created. But in retrospect I think this thinking was absolutely on > the right track. Using the categorycombo to specify the > disaggregation layout of a particular form elements makes very good > sense. What was also inspired was having the categorycombo as a named > persisted object in its own right which could be used across different > dataelements. > > Cheers > Bob > > > > > Ola > > -------- > > > > > > > > > >> > >> Regards > >> Bob > >> > >> > > >> > We can relatively efficiently validate that a dataset object is not > >> > persisted which has the same formid, orgunitid and an overlapping > >> > period. > >> > > >> > There is no longer any ambiguity about periodtype of a datavalue. > >> > > >> > stored_by, timestamp, comment might go either way. Probably they need > >> > to stay on datavalue. I notice comment is rarely used but its really > >> > useful to have a comment on datavalueset for import purposes. > >> > > >> > 'nuff designing out loud. Got to go. > >> > > >> > Regards > >> > Bob > >> > > >> >> > >> >> > >> >> Ola > >> >> --------- > >> >> > >> >>> > >> >>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad > >> >>> <olati...@gmail.com> > >> >>> wrote: > >> >>>> > >> >>>> Hi, > >> >>>> > >> >>>> After Kim Anh's email about the use of the same data elements with > >> >>>> different period types I dug up this old discussion from March > 2009. > >> >>>> > >> >>>> What is the status on this work, or did we not conclude this? > >> >>>> > >> >>>> Ola > >> >>>> ---------- > >> >>>> > >> >>>> 2009/3/20 Bob Jolliffe <bobjolli...@gmail.com> > >> >>>>> > >> >>>>> 2009/3/20 Lars Helge Øverland <larshe...@gmail.com>: > >> >>>>> > > >> >>>>> >> > >> >>>>> >> Yes this is true. But what do you think of the idea to enforce > >> >>>>> >> DataSet membership having a default DataSet for all the > >> >>>>> >> delinquents? > >> >>>>> >> I'm not sure if it can be enforced by the schema, but at least > by > >> >>>>> >> the > >> >>>>> >> application. > >> >>>>> > > >> >>>>> > OK but what does this give us in terms of PeriodType-determining > >> >>>>> > if > >> >>>>> > this > >> >>>>> > default DataSet has a null PeriodType? > >> >>>>> > >> >>>>> Nothing really. The only effect would be you have an index on the > >> >>>>> unassigned DataElements for what its worth. Mainly it would be > >> >>>>> useful > >> >>>>> for determining easily the available DataElements which can be > added > >> >>>>> to a DataSet. Maybe its a nonsense idea - I was just trying to > >> >>>>> think > >> >>>>> of ways to make editing DataSets reasonably straightforward. > >> >>>>> > >> >>>>> > > >> >>>>> >> > >> >>>>> >> I don't know if its about right or wrong. There are pros and > >> >>>>> >> cons of > >> >>>>> >> both approaches. What you gain on the swings you lose on the > >> >>>>> >> roundabouts :-) > >> >>>>> >> > >> >>>>> >> In the explicit case the application will have to enforce that > >> >>>>> >> DataSet > >> >>>>> >> members all have the same periodType. > >> >>>>> >> > >> >>>>> >> In the implicit case the application will have to enforce that > >> >>>>> >> DataElements can only be members of multiple groups if these > >> >>>>> >> share > >> >>>>> >> the > >> >>>>> >> same PeriodType. > >> >>>>> >> > >> >>>>> >> The net result as far as the Data API is concerned can and must > >> >>>>> >> be > >> >>>>> >> the > >> >>>>> >> same. Perhaps we should define exactly what extra methods we > >> >>>>> >> want in > >> >>>>> >> the API first. We have already identified a few. Then decide > >> >>>>> >> whether > >> >>>>> >> a database change is necessitated by these. > >> >>>>> > > >> >>>>> > Yes. We need at least service method: > >> >>>>> > > >> >>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType > ) > >> >>>>> > > >> >>>>> > and getter on the DataElement object: > >> >>>>> > > >> >>>>> > PeriodType getPeriodType() > >> >>>>> > > >> >>>>> > > >> >>>>> > I guess we could make a branch, start coding and see how it > works > >> >>>>> > out. > >> >>>>> > >> >>>>> Sure. So long as we are adding methods we won't be breaking > >> >>>>> anything > >> >>>>> in terms of backward compatibility. Just enforcing application > >> >>>>> level > >> >>>>> constraints. Then we can really encourage (enforce?) upper layers > >> >>>>> to > >> >>>>> strictly interact with the data via the API. Even if this might > >> >>>>> occasionally mean making some lightweight API methods which bypass > >> >>>>> the > >> >>>>> ORM. > >> >>>>> > >> >>>>> > > >> >>>>> > Another issue would arise in the (exotic) situation where > someone > >> >>>>> > assigns a > >> >>>>> > DataElement to a DataSet, enter data for it, then removes it > from > >> >>>>> > the > >> >>>>> > DataElement. The data is there, but how do we deal with it in > >> >>>>> > regard > >> >>>>> > to the > >> >>>>> > mentioned required functionaly (trend analysis, datamart) ? > >> >>>>> > > >> >>>>> > >> >>>>> Yes this gets a bit weird (I presume you mean removes it from the > >> >>>>> DataSet). I'm guessing you haven't lost the data because the > >> >>>>> dataValues each have a PeriodID which in turn is linked to a > >> >>>>> PeriodType. I suppose that (in such an exotic headspace) > >> >>>>> DataElements > >> >>>>> can in fact change their PeriodTypes over time, though I imagine > its > >> >>>>> not a great idea. > >> >>>>> > >> >>>>> The effect would be the same in the explicit relationship case, if > >> >>>>> someone assigns a DataElement to a DataSet, enter data for it, > then > >> >>>>> changes the PeriodType of the DataElement ... > >> >>>>> > >> >>>>> Cheers > >> >>>>> Bob > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> Mailing list: > >> >>>>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> > >> >>>>> Post to : dhis2-devs@lists.launchpad.net > >> >>>>> Unsubscribe : > >> >>>>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> > >> >>>>> More help : https://help.launchpad.net/ListHelp > >> >>>> > >> >>> > >> >> > >> >> > >> > > > > > >
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp