Re: unstructured text notes: refining the target (#431)

2017-01-23 Thread Taylor, Bradley
Nice breakdown Dan.

I agree that #573 should be the medium term goal to roll out to all the GPC 
sites, with limited infrastructure and effort requirements. Not to get too 
caught up in the details, but there needs to be discussion about the different 
note types that may be de identified in the cohort by cohort basis (ie imaging 
impressions, encounter, discharge etc.)

Correct me if I am wrong, but for KU, you have approval for a specific type of 
note at this time? Nursing notes?

Each of our sites may have different approaches and requirements for IRB, 
compliance and security oversight. This is something that should be addressed 
sooner than later, and will set the stage for the enterprise scale approach.

Regards,
Brad

From: Dan Connolly <dconno...@kumc.edu>
Date: Friday, January 20, 2017 at 12:20 PM
To: Russ Waitman <rwait...@kumc.edu>, Bradley Taylor <btay...@mcw.edu>
Cc: "<gpc-dev@listserv.kumc.edu>" <gpc-dev@listserv.kumc.edu>
Subject: RE: unstructured text notes: refining the target (#431)

I broke 
#431<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_431=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=BdRy_jh1HGZ5GVTSSKFTGEWeTGJarqN9WKmK7lvGZ00=>
 into two targets as I see them:

  *   #572 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_572=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=lllZ5dgC1s8cj5-hli07kEVAngVst7PmPv8ekx98a1s=>
 enterprise scale unstructured text notes de-identified, in 
i2b2<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_572=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=lllZ5dgC1s8cj5-hli07kEVAngVst7PmPv8ekx98a1s=>
  *   #573  
<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_573=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=m17YnhO6Pq0k9InX1rtD6iiWTR8kWs_P4jozantP_bg=>
 de-identified text notes for on a cohort-by-cohort 
basis<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_ticket_573=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=m17YnhO6Pq0k9InX1rtD6iiWTR8kWs_P4jozantP_bg=>
Most of what is involved in the cohort-by-cohort approach is also needed for 
the enterprise scale approach, but it's a smaller lift and it meets 
requirements for some known use cases. Rolling out #573 in the medium term with 
#572 as a stretch goal is something I can get my head around.

The tickets include use cases; for enterprise scale:

  *   Investigator works on an i2b2 query for a genetic marker, in 
collaboration with an honest broker at a site such as Maren at KUMC
  *   Maren distributes the query to GPC sites via babel and the investigator 
submits a GPC DROC request
  *   each honest broker executes the query against their i2b2 and so on as 
described in RC11 of the GPC phase 1 
proposal<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_DataSecurity-23providing-2Ddata=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=IBgKyImmBnyAVNgjIJPUG3cAMWoX_ygZwUPqoUiYGTA=>

 *   
DataBuilder<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_DataBuilder=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=kig6Jlw_Cpd6rHuhMBP-BVaHKssldcHIYlO6feeyZgQ=>
 query results from each site include relevant de-identified notes
And for cohort-by-cohort:

Main use case identified at 
HackathonFour<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_HackathonFour=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=Nr69q2I9agbqjjTOWeJE1t9Lx8OTJvBtlISj_-qXkLI=>
 was de-identified chart review.



Feasibility of this approach is supported by Tim's success at deploying 
​de-id-docker<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MCW-2DBMI_de-2Did-2Ddocker=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbzb4dCYF31A=QXtcMyuqLWxQ2kzOAwebnVZUQOv3q-byH9ZKEx2LowQ=zglTUPq0BUnnO9xi2fVnCCUimn9wc-SyN9UCMWWnkdI=>
 in a matter of hours after George's presentation at 
HackathonFour<https://urldefense.proofpoint.com/v2/url?u=https-3A__informatics.gpcnetwork.org_trac_Project_wiki_HackathonFour=DgMGaQ=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y=PIPyLFB5dqgbz

RE: unstructured text notes: refining the target (#431)

2017-01-20 Thread Dan Connolly
I broke #431<https://informatics.gpcnetwork.org/trac/Project/ticket/431> into 
two targets as I see them:

  *   #572 <https://informatics.gpcnetwork.org/trac/Project/ticket/572> 
enterprise scale unstructured text notes de-identified, in 
i2b2<https://informatics.gpcnetwork.org/trac/Project/ticket/572>
  *   #573  <https://informatics.gpcnetwork.org/trac/Project/ticket/573> 
de-identified text notes for on a cohort-by-cohort 
basis<https://informatics.gpcnetwork.org/trac/Project/ticket/573>

Most of what is involved in the cohort-by-cohort approach is also needed for 
the enterprise scale approach, but it's a smaller lift and it meets 
requirements for some known use cases. Rolling out #573 in the medium term with 
#572 as a stretch goal is something I can get my head around.

The tickets include use cases; for enterprise scale:

  *   Investigator works on an i2b2 query for a genetic marker, in 
collaboration with an honest broker at a site such as Maren at KUMC
  *   Maren distributes the query to GPC sites via babel and the investigator 
submits a GPC DROC request
  *   each honest broker executes the query against their i2b2 and so on as 
described in RC11 of the GPC phase 1 
proposal<https://informatics.gpcnetwork.org/trac/Project/wiki/DataSecurity#providing-data>
 *   
DataBuilder<https://informatics.gpcnetwork.org/trac/Project/wiki/DataBuilder> 
query results from each site include relevant de-identified notes

And for cohort-by-cohort:

Main use case identified at 
HackathonFour<https://informatics.gpcnetwork.org/trac/Project/wiki/HackathonFour>
 was de-identified chart review.


Feasibility of this approach is supported by Tim's success at deploying 
​de-id-docker<https://github.com/MCW-BMI/de-id-docker> in a matter of hours 
after George's presentation at 
HackathonFour<https://informatics.gpcnetwork.org/trac/Project/wiki/HackathonFour>.

Note that the underlying technology is the same as KUMC adopted from MCW for 
enterprise scale use:

  *   
​MCW_BMI/unstructured-notes-deidentification<https://bitbucket.org/MCW_BMI/unstructured-notes-deidentification>

Another use case is distributed query a la popmednet. Running federated queries 
"lights out" would involve something like:

  1.  using 
PortQuery<https://informatics.gpcnetwork.org/trac/Project/wiki/PortQuery> to 
run the i2b2 cohort query locally and noting the resulting patient set id
  2.  Invoking the docker container to extract the notes
  3.  Running the distributed analysis code

Combining 2 and 3 into a Jenkins job seems straightforward.

--
Dan


From: Dan Connolly
Sent: Wednesday, January 18, 2017 5:19 PM
To: Russ Waitman; Taylor, Bradley
Cc: <gpc-dev@listserv.kumc.edu>
Subject: RE: unstructured text notes: refining the target (#431)

Russ, Brad, w.r.t. figure figure 4 of our proposal, what is "NLP derived 
concepts"?

http://frontiersresearch.org/frontiers/sites/default/files/Phase%20II%20Proposal.pdf

--
Dan


From: Dan Connolly
Sent: Tuesday, January 17, 2017 12:24 PM
To: Russ Waitman; Taylor, Bradley
Cc: <gpc-dev@listserv.kumc.edu>
Subject: unstructured text notes: refining the target (#431)

Russ, Brad (when you get back),

I'd like to get a few concrete use cases as targets for this deliverable so 
that we can get tangible experience with what's required and what would be 
nice-to-have.

MCW and IU both report trying the approach of de-identifying all their notes 
and putting them in i2b2 and coming to the conclusion that it was unwieldy. MCW 
now does de-identification on a cohort by cohort basis. I'm not sure how to 
characterize the IU approach.

The cohort-by-cohort basis suffices for GPC needs, as far as I can tell.

For example: suppose investigators specify, in their GPC DROC request, that 
progress notes are part of the data that they want. Then each site runs their 
cohort query and delivers notes for that cohort. The MCW process should work 
well as a recommended method but other methods would be acceptable if a site 
(such as IU) already has a suitable process.

Perhaps one concrete case would be: progress notes for the ALS cohort, since 
it's small, then try the breast cancer cohort. Or are there other cohorts where 
we have a customer demand for notes?

For reference: #431<https://informatics.gpcnetwork.org/trac/Project/ticket/431>

--
Dan

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


RE: unstructured text notes: refining the target (#431)

2017-01-18 Thread Dan Connolly
Russ, Brad, w.r.t. figure figure 4 of our proposal, what is "NLP derived 
concepts"?

http://frontiersresearch.org/frontiers/sites/default/files/Phase%20II%20Proposal.pdf

--
Dan


From: Dan Connolly
Sent: Tuesday, January 17, 2017 12:24 PM
To: Russ Waitman; Taylor, Bradley
Cc: 
Subject: unstructured text notes: refining the target (#431)

Russ, Brad (when you get back),

I'd like to get a few concrete use cases as targets for this deliverable so 
that we can get tangible experience with what's required and what would be 
nice-to-have.

MCW and IU both report trying the approach of de-identifying all their notes 
and putting them in i2b2 and coming to the conclusion that it was unwieldy. MCW 
now does de-identification on a cohort by cohort basis. I'm not sure how to 
characterize the IU approach.

The cohort-by-cohort basis suffices for GPC needs, as far as I can tell.

For example: suppose investigators specify, in their GPC DROC request, that 
progress notes are part of the data that they want. Then each site runs their 
cohort query and delivers notes for that cohort. The MCW process should work 
well as a recommended method but other methods would be acceptable if a site 
(such as IU) already has a suitable process.

Perhaps one concrete case would be: progress notes for the ALS cohort, since 
it's small, then try the breast cancer cohort. Or are there other cohorts where 
we have a customer demand for notes?

For reference: #431

--
Dan

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev