Nice breakdown Dan. I agree that #573 should be the medium term goal to roll out to all the GPC sites, with limited infrastructure and effort requirements. Not to get too caught up in the details, but there needs to be discussion about the different note types that may be de identified in the cohort by cohort basis (ie imaging impressions, encounter, discharge etc.)
Correct me if I am wrong, but for KU, you have approval for a specific type of note at this time? Nursing notes? Each of our sites may have different approaches and requirements for IRB, compliance and security oversight. This is something that should be addressed sooner than later, and will set the stage for the enterprise scale approach. Regards, Brad From: Dan Connolly <dconno...@kumc.edu> Date: Friday, January 20, 2017 at 12:20 PM To: Russ Waitman <rwait...@kumc.edu>, Bradley Taylor <btay...@mcw.edu> Cc: "<gpc-dev@listserv.kumc.edu>" <gpc-dev@listserv.kumc.edu> Subject: RE: unstructured text notes: refining the target (#431) I broke #431<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_431&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=BdRy_jh1HGZ5GVTSSKFTGEWeTGJarqN9WKmK7lvGZ00&e=> into two targets as I see them: * #572 <https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_572&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=lllZ5dgC1s8cj5-hli07kEVAngVst7PmPv8ekx98a1s&e=> enterprise scale unstructured text notes de-identified, in i2b2<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_572&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=lllZ5dgC1s8cj5-hli07kEVAngVst7PmPv8ekx98a1s&e=> * #573 <https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_573&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=m17YnhO6Pq0k9InX1rtD6iiWTR8kWs_P4jozantP_bg&e=> de-identified text notes for on a cohort-by-cohort basis<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_573&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=m17YnhO6Pq0k9InX1rtD6iiWTR8kWs_P4jozantP_bg&e=> Most of what is involved in the cohort-by-cohort approach is also needed for the enterprise scale approach, but it's a smaller lift and it meets requirements for some known use cases. Rolling out #573 in the medium term with #572 as a stretch goal is something I can get my head around. The tickets include use cases; for enterprise scale: * Investigator works on an i2b2 query for a genetic marker, in collaboration with an honest broker at a site such as Maren at KUMC * Maren distributes the query to GPC sites via babel and the investigator submits a GPC DROC request * each honest broker executes the query against their i2b2 and so on as described in RC11 of the GPC phase 1 proposal<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_DataSecurity-23providing-2Ddata&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=IBgKyImmBnyAVNgjIJPUG3cAMWoX_ygZwUPqoUiYGTA&e=> * DataBuilder<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_DataBuilder&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=kig6Jlw_Cpd6rHuhMBP-BVaHKssldcHIYlO6feeyZgQ&e=> query results from each site include relevant de-identified notes And for cohort-by-cohort: Main use case identified at HackathonFour<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_HackathonFour&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=Nr69q2I9agbqjjTOWeJE1t9Lx8OTJvBtlISj_-qXkLI&e=> was de-identified chart review. Feasibility of this approach is supported by Tim's success at deploying de-id-docker<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MCW-2DBMI_de-2Did-2Ddocker&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=zglTUPq0BUnnO9xi2fVnCCUimn9wc-SyN9UCMWWnkdI&e=> in a matter of hours after George's presentation at HackathonFour<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_HackathonFour&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=Nr69q2I9agbqjjTOWeJE1t9Lx8OTJvBtlISj_-qXkLI&e=>. Note that the underlying technology is the same as KUMC adopted from MCW for enterprise scale use: * MCW_BMI/unstructured-notes-deidentification<https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_MCW-5FBMI_unstructured-2Dnotes-2Ddeidentification&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=7_vpLSKn8qCMKJTz5v9C7VNUlly90cmw_ycYOri54bY&e=> Another use case is distributed query a la popmednet. Running federated queries "lights out" would involve something like: 1. using PortQuery<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_PortQuery&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=sVSowH1AuaU1mLiXxjHHj9DyEok_GdUR9jxBTpj1OJ8&e=> to run the i2b2 cohort query locally and noting the resulting patient set id 2. Invoking the docker container to extract the notes 3. Running the distributed analysis code Combining 2 and 3 into a Jenkins job seems straightforward. -- Dan ________________________________ From: Dan Connolly Sent: Wednesday, January 18, 2017 5:19 PM To: Russ Waitman; Taylor, Bradley Cc: <gpc-dev@listserv.kumc.edu> Subject: RE: unstructured text notes: refining the target (#431) Russ, Brad, w.r.t. figure figure 4 of our proposal, what is "NLP derived concepts"? http://frontiersresearch.org/frontiers/sites/default/files/Phase%20II%20Proposal.pdf<https://urldefense.proofpoint.com/v2/url?u=http-3A__frontiersresearch.org_frontiers_sites_default_files_Phase-2520II-2520Proposal.pdf&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=vf-ZX0N67neXzx6cGD6SO3Hjqu-1PCBOSruzM5vg_wE&e=> -- Dan ________________________________ From: Dan Connolly Sent: Tuesday, January 17, 2017 12:24 PM To: Russ Waitman; Taylor, Bradley Cc: <gpc-dev@listserv.kumc.edu> Subject: unstructured text notes: refining the target (#431) Russ, Brad (when you get back), I'd like to get a few concrete use cases as targets for this deliverable so that we can get tangible experience with what's required and what would be nice-to-have. MCW and IU both report trying the approach of de-identifying all their notes and putting them in i2b2 and coming to the conclusion that it was unwieldy. MCW now does de-identification on a cohort by cohort basis. I'm not sure how to characterize the IU approach. The cohort-by-cohort basis suffices for GPC needs, as far as I can tell. For example: suppose investigators specify, in their GPC DROC request, that progress notes are part of the data that they want. Then each site runs their cohort query and delivers notes for that cohort. The MCW process should work well as a recommended method but other methods would be acceptable if a site (such as IU) already has a suitable process. Perhaps one concrete case would be: progress notes for the ALS cohort, since it's small, then try the breast cancer cohort. Or are there other cohorts where we have a customer demand for notes? For reference: #431<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_431&d=DgMGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=PIPyLFB5dqgbzb4dCYF31A&m=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ&s=BdRy_jh1HGZ5GVTSSKFTGEWeTGJarqN9WKmK7lvGZ00&e=> -- Dan _______________________________________________ Gpc-dev mailing list Gpc-dev@listserv.kumc.edu http://listserv.kumc.edu/mailman/listinfo/gpc-dev