Thanks Alex for diving into this first and stating things so eloquently. I agree with everything Alex put and would add:
>If we move the records from VITAL to OBS_CLIN, we need to merge the >valuesets for the provenance fields. If we do that, OBSCLIN_SOURCE would >contain OD (Order/EHR), RG (Registry/ancillary system) and HC (Healthcare >delivery setting). >There is a fair amount of overlap between these terms. We are proposing to >deprecate OD and RG and utilize HC instead (we will make the same change to >OBSGEN_SOURCE as well). >Any concerns with this change? Registries normally contain chart abstracted data which can be useful, but also adds an additional step for human error. I believe it would be useful to keep the distinction between potentially interpreted data and raw data from the EHR. >Addition of Result_text This value would likely be a free text field and this may allow PHI values to slip through. We would not recommend the addition of a free text field in a limited data set. >Addition of Raw Condition Text This value is a free text field at one of our institutions and a value set at another. We could use the value set, but would not be able to add a free text string to a limited data set. -----Original Message----- From: Stoddard, Alexander <[email protected]> Sent: Thursday, September 24, 2020 10:02 PM To: [email protected] Cc: Taylor, Bradley <[email protected]>; [email protected]; Manuel, Laura S M <[email protected]> Subject: CDM 6.0 review responses from MCW Hello GPC-DEV, MCW agreed to review the CDM 6.0 spec during the dev call 2020-09-22. The replies to DRNOC, using an excel file template (available at https://pcornet.imeetcentral.com/drnoc-workgroups/folder/WzIwLDEzMTI2ODA5XQ/) , have been requested by end of day Friday 2020-09-25. Below are a text version of the responses that I will be sending on behalf of MCW. Main questions seeking feedback ------------------------------------- >As the CDM has grown in size, the image included in the specification (Page 9) > conveys less and less information. >Any concerns if it is deleted? Not a concern, but a highlighted list of changed tables/new columns on a single page is useful >Suggestions on what we might consider as a replacement? A machine readable, diff-able and version controlled schema definition would be very useful. Potentially this would allow tool assisted SQL generation for the different RDMS, or even visualization generation. A candidate for such a schema definition format would be that used by sql-alchemy python package: https://docs.sqlalchemy.org/en/13/core/metadata.html >Any there any concerns about the strategy to deprecate VITAL and move the >records to OBS_CLIN? OBS_CLIN is a much better data model for vitals but transitioning distinct columns in the VITALs table to a single column requiring different value-sets for different qualitative variables will be easier with a more agile and open process for value-set definition during the transition. Open appending of additional values to a version controlled value set reference would offer projects much greater flexibility to adopt additional tests and observations throughout the CDM lifecycle without any loss of specificity, accuracy or backwards compatibility. This is especially true of _QUAL columns that will hold values for many different results/observations unlike domain specific columns historically defined using the current process (e.g. RACE in the DEMOGRAPHIC table and SMOKING in the VITALS table) In general qualitative value-sets should be defined on the codes used to specify given observation rows, not the whole _QUAL column. >If we move the records from VITAL to OBS_CLIN, we need to merge the >valuesets for the provenance fields. If we do that, OBSCLIN_SOURCE would >contain OD (Order/EHR), RG (Registry/ancillary system) and HC (Healthcare >delivery setting). >There is a fair amount of overlap between these terms. We are proposing to >deprecate OD and RG and utilize HC instead (we will make the same change to >OBSGEN_SOURCE as well). >Any concerns with this change? EHR vs Registry seems like a valid source distinction. From experience the source fields are most often useful for data tracing in QC operations on individual records, rather than research and aggregation of the data. A richer value-set may therefore be of benefit to sites. >Is the description for Telehealth encounters sufficient, or is more detail >needed? Description is sufficient but the real issue is likely the specificity with which these encounters (vs routine telephone or other electronic communications) are recorded in the source systems of sites. >If we remove the VALUESET and VALUESET DESCRIPTOR columns from the >FIELDS tab of the parseable file, would that pose a problem? (The >VALUESETS tab would remain unchanged) No problem. The data in these columns is much more easily used as represented in the VALUESETS tab. A flag or categorical value to indicate a field uses a valueset on the VALUESETS tab would be useful. General Comments --------------------- None Value Sets ----------- See comments on the VITALS transition. LAB_HISTORY table --------------------- No particular issues with the schema definition. But MCW remains very dubious of the utility or accuracy possible with this table versus a centrally held one maintained by DRNOC. If a lab test is stable enough and well defined enough for population reference ranges (but doesn't have individual test normal ranges defined for a particular source) then a centrally maintained reference fallback is reasonable. When an assay does not have generalizable normal ranges, e.g. when run relative to a variable arbitrary reference and/or varying from machine to machine, then you really need a per record reference for the normal range and this table will be insufficiently granular and misleading. The spec reads 'Every record in this table should be unique.' but this is trivially true given each row has an arbitrary LABHISTORYID and uniqueness is otherwise undefined. New / Modified fields ------------------------ LAB_RESULT_CM RESULT_TEXT - Implementation concern - in MCW's experience SAS expands varchar columns to their maximum width, this will bloat table size if a column is sparsely populated with large records. Much more efficient would be a separate relational table with text results keyed by LAB_RESULT_CM_ID ENCOUNTER ENCOUNTER_TYPE - No comment ENCOUNTER ADMITTING_SOURCE - No comment CONDITION CONDITION_SOURCE - Guidance on expected source of Chief Complaint would be useful, should it always be linked to an ENCOUNTER? CONDITION RAW_CONDITION_TEXT - No comment OBS_CLIN OBSCLIN_START_DATE - No comment OBS_CLIN OBSCLIN_START_TIME - No comment OBS_CLIN OBSCLIN_STOP_DATE - No comment OBS_CLIN OBSCLIN_STOP_TIME - No comment OBS_CLIN OBSCLIN_SOURCE - May be better to maintain EHR / Registry source distinction OBS_CLIN OBSCLIN_ABN_IND - No comment OBS_GEN OBSGEN_START_DATE - No comment OBS_GEN OBSGEN_START_TIME - No comment OBS_GEN OBSGEN_STOP_DATE - No comment OBS_GEN OBSGEN_STOP_TIME - No comment OBS_GEN OBSGEN_SOURCE - May be better to maintain EHR / Registry source distinction OBS_GEN OBSGEN_ABN_IND - No comment OBS_GEN OBSGEN_TABLE_MODIFIED - No comment HARVEST CDM_VERSION - No comment HARVEST TOKEN_ENCRYPTION_KEY - Is a better name TOKEN_ENCRYPTION_KEY_NAME ? - Please give an example in guidance HARVEST OBSCLIN_START_DATE_MGMT - No comment HARVEST OBSCLIN_STOP_DATE_MGMT - No comment HARVEST OBSGEN_START_DATE_MGMT - No comment HARVEST OBSGEN_STOP_DATE_MGMT - No comment Best regards, Alex Stoddard Programmer/Analyst Biomedical Informatics Clinical & Translational Science Institute Medical College of Wisconsin [email protected] I am currently working remotely -------------------------------------------------- _______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
